jts / nanopolish

Signal-level algorithms for MinION data
MIT License
568 stars 159 forks source link

nanopolish polya outputting .tsv with READ_FAILED_LOAD #1067

Open tiffuhu opened 1 year ago

tiffuhu commented 1 year ago

I have a run with multiple barcoded samples that I have demultiplexed and am attempting to run nanopolish polya on. I am using both the "passed" fast5 files and the called fastq files that were the outputs of my sequencing run. I did not do any additional basecalling or extraction. I then merged the fastq files into one, while leaving the fast5 files as multi-fast5 files in a folder. The reads were able to be aligned to my reference via minimap2, and I also checked the coverage to make sure that the coverage was sufficient and what I have expected. However, when running nanopolish polya, my output reads as READ_FAILED_LOAD. Would you happen to know where to begin to debug this program?

jts commented 1 year ago

Can you provide the exact nanopolish commands that you ran?

tiffuhu commented 1 year ago

My commands were:

nanopolish index -d ~/fast5/test/ ~/fastq/test.fastq.gz

nanopolish polya --threads=8 --reads=~/fastq/test.fastq.gz --bam=~/bam_files/test.sorted.bam --genome=~/references/test.fa > test_polya_results.tsv
jts commented 1 year ago

That looks OK - what messages were printed to the terminal for each command?

tiffuhu commented 1 year ago

Here are the outputs of the commands:

nanopolish index -d ~/fast5/test/ ~/fastq/test.fastq.gz
[readdb] indexing ~/fast5/test/
[readdb] num reads: 84069, num reads with path to fast5: 82690

nanopolish polya --threads=8 --reads=~/fastq/test.fastq.gz --bam=~/bam_files/test.sorted.bam --genome=~/references/test.fa > test_polya_results.tsv
[post-run summary] total reads: 80302, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 80302, bad fast5: 1343
Xing-2024 commented 1 month ago

Hi, @jts : I met the same problem:

the nanopolish index: cmd: nanopolish index --directory=./20231106_1203_3E_PAO98063_1c9389ff/fast5_pass/barcode02/ --sequencing-summary=./20231106_1203_3E_PAO98063_1c9389ff/sequencing_summary_PAO98063_1c9389ff_9ae07097.txt barcode02.fastq output: [readdb] indexing ./20231106_1203_3E_PAO98063_1c9389ff/fast5_pass/barcode02/ [readdb] num reads: 21858917, num reads with path to fast5: 21858917

the nanopolish polya cmd: nanopolish polya --threads=32 --reads=barcode02.fastq --bam=barcode02.aln.sorted.bam --genome=/ref/ensembl/fa/Mus_musculus.GRCm39.dna.toplevel.fa > barcode02.polya.tsv output on the screen is: [+] Loading nanopolish 0.14.0 [W::fai_get_val] Reference 130:1194|06358dc5-467d-47db-ad19-afae399f712b not found in FASTA file, returning empty sequence [warning] sequence of read 130:1194|06358dc5-467d-47db-ad19-afae399f712b is empty [W::fai_get_val] Reference 123:366|cb74f0f0-5503-47e4-be3f-c2d2d118265b not found in FASTA file, returning empty sequence [warning] sequence of read 123:366|cb74f0f0-5503-47e4-be3f-c2d2d118265b is empty [W::fai_get_val] Reference 139:395|0f7403ce-b9e3-4331-b1d0-3a1962716f01 not found in FASTA file, returning empty sequence [warning] sequence of read 139:395|0f7403ce-b9e3-4331-b1d0-3a1962716f01 is empty [W::fai_get_val] Reference 140:358|fc66eeac-25af-4e20-8c95-624957230f2e not found in FASTA file, returning empty sequence [warning] sequence of read 140:358|fc66eeac-25af-4e20-8c95-624957230f2e is empty

the output result tsv file looks like this: image Any suggestion?

thank you!