jts / nanopolish

Signal-level algorithms for MinION data
MIT License
568 stars 159 forks source link

Nanopolish applicaple for data from ONT's R10 flow cells? #677

Closed Jelly5000 closed 4 years ago

Jelly5000 commented 5 years ago

Hello Jared,

I had ~20000 amplicons from one locus sequenced by a R10 flow cell from Oxford Nanoporetech, then mapped the reads by graphmap, converted to bam, sorted and indexed with samtools and finally indexed the fastq-files by nanopolish index, everything without any errors. After nanopolish variant the post-run summary showed that not a single read passed the QC, all reads led to ‘no alignment’. I repeated sequencing with the same library, but this time with a R9 (R9.4.1) flow cell and processed the reads with the pipeline described above. With R9 data nanopolish found only 4% reads with ‘no alignment’, negligible amount of reads had other problems, and the rest passed the QC. The program processed the reads and the resulting vcf file was very informative and worked fine for downstream analysis. Is this a known issue with R10 data and do you have a solution e.g. specifiying a certain parameter in the commandline?

Kind regards, Jeremy

jts commented 5 years ago

Hi @Jelly5000,

Nanopolish supports R10 data but its currently in a branch and not merged into the main version:

https://github.com/jts/nanopolish/tree/r10

You can clone and compile this branch (let me know if you need help doing this) and try it out. I'd appreciate hearing about how well it works as you're one of the first users to try R10.

Jared

Jelly5000 commented 5 years ago

Hi Jared, a first test run using the r10-branch with a subset of my data from R10 flow cell with ~1500 reads had only 1% reads with 'no alignment'. The other reads passed the QC. The resulting vcf file passed all the downstream computation. This makes my happy. Thank you very much for your help.

jts commented 5 years ago

Great, thanks for the feedback. Do the variant calls look OK?

Jelly5000 commented 5 years ago

Hello Jared, the variant calls look fine.

I have now with a second dataset some difficulties, again regarding the QC of nanopolish.

Compared to the first R10 dataset which passed nanopolish's QC I sequenced another amplicon library with only different samples, but same primers, same workflow, even the same R10 flow cell. After (with and without demultiplexing and adapter trimming by porechop), graphmap and samtools I used nanopolish variants which could not do the alignment of reads to the signal data. From ~ 13k reads, 8 reads were assigned to 'qc failed', and the rest was assigned to 'no alignment'. The resulting vcf file contained only the header. The version and the files of nanopolish in my computer did not change. Another test with my first R10 data set worked again.

In a third R10 dataset nanopolish also assigned almost all the reads to 'no alignment' but however was able to write an informative vcf file with lots of variants as expected.

I have sent you a link to the data via email. If you can spend some time I would appreciate your assessment of the fast5 files and a possible cause of the bad QC.

Kind regards, Jeremy

jts commented 5 years ago

Hm, it looks like the flowcell type in the second set of fast5s is FLO-MIN110 instead of flo-min110 in the others. I can fix this quickly.

jts commented 5 years ago

Actually the flowcell type is in an unexpected attribute in the fast5. I'll need to discuss with ONT and get back to you. In the meantime if you put your fast5 files in a directory called "r10" (then re-run nanopolish index) it should work.

JosephLalli commented 5 years ago

Just going to second this request. I've downloaded and compiled the r10 branch, and I've seemingly indexed my fast5 files. My fast5 files are in a folder called r10. But when I try to align events, I'm getting the following message:

nanopolish eventalign -v -r test.fastq -b minimaponlytest.sorted.bam -g reference.fasta -t 8 --scale-events --progress > test.nanopolish.tsv [fai_load] build FASTA index. [bam process] processed 959 bam records in 0.00s (inf records/s). Latest: TestContig:50 [post-run summary] total reads: 271, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 274

Indexing output: nanopolish index -d r10 test.fastq [readdb] indexing r10 [readdb] num reads: 44000, num reads with path to fast5: 44000

The index files are in my working directory. Is this an r10 issue, or something more basic?

jts commented 5 years ago

Hm, can you send me an example fast5 file and corresponding basecalls @JosephLalli ?

Jelly5000 commented 5 years ago

Regarding Variant Calling: The problem was solved after renaming the fast5 file containing folder to 'r10'. Thanks you!

jts commented 5 years ago

Hi @Jelly5000 and @JosephLalli,

I've just pushed some new code to the R10 branch that should properly handle your R10 fast5s (without having to put them in an "r10" directory). If you have time could you grab the latest code and check that it works on your end?

Thanks jared

Jelly5000 commented 5 years ago

Yes, indexing and variant calling work now also with fast5 files in a folder with another folder name than 'r10' :)

jts commented 5 years ago

Great thanks!

AndreaLegati commented 1 year ago

Hi @jts, could I gently ask you how I can clone and compile the R10 branch please?

Thank you, Andrea