jts / nanopolish

Signal-level algorithms for MinION data
MIT License
569 stars 159 forks source link

After basecallling with dorado, nanopolish was unable to recognize methylation information #1140

Open happier21 opened 7 months ago

happier21 commented 7 months ago

Dear all,

In the Quickstart-calling methylation with nanopolish section of the nanopolish usage instructions, guppy basecaller is used to identify the base signal of reads. I replaced guppy with ont's newly recommended basecalling tool, dorado. The rest of the steps remained the same, but the methylation information was significantly less than with guppy.Is it because dorado and nanopolish are not compatible? I examined my process log and found that while the number of reads was still high when using the nanopolish index and minimap2, the number of reads decreased significantly when using nanopolish call-methylation.

d50033230e00dccfa0c08f2312a118e 135504aab317cf37c5d6d478b96a4ef 117e880364de210865609dbfcd21a1b

Why does this happen

Thank you, ShengquanWang

hasindu2008 commented 7 months ago

Is the data new R10 data?

happier21 commented 7 months ago

Yes, the data is new R10 data

hasindu2008 commented 7 months ago

nanopolish doesn't support r10 data yet. You can try f5c which is an optimised re-implementation of the index, call-methylation and eventalign modules in nanopolish that also supports r10 and rna004.

happier21 commented 7 months ago

Thank you for your help, this method seems to work, but when running f5c call-methylation, I find another problem. Through the log of f5c call-methylation, It was found that the quality of dorado basecaller's reads was significantly lower than that of guppy basecaller's reads. Why this happened? This is the log of f5c call-methylation

1712556874646

I then to use the dorado basecaller data run "samtools view -b -q 20 -F 4 test.sorted.bam > test.sorted.q20.mapped.bam" and calculate the number of reads in bam file. The result is as follows

1712557327264

I do the same with guppy basecaller's data. The result is as follows

1712557442867

Why is this quantity so different

hasindu2008 commented 7 months ago

Could you please open an issue on the f5c repo? I will answer there.

What is the mapper you are using - MInimap2? If MInimap2 aligns well for Guppy and not DOrado - Might be something with Dorado - are you using the correct model?

happier21 commented 7 months ago

Thank you very much for your help! This is the full log of the f5c call-methylation:

1712629800738

This is dorado basecaller's order: dorado basecaller /share/home/yzwl_hanxs/app/dorado-0.5.3-linux-x64/model/dna_r10.4.1_e8.2_400bps_sup@v4.1.0 ./pod5/ | amtools view -bhS -@ 10 > test.bam Convert bam to fastq: samtools fastq -0 test.fastq test.bam Use minimap2 to align: minimap2 -a -x map-ont /share/home/yzwl_hanxs/refdata-gex-GRCh38-2020-A/fasta/genome.fa test.fastq | samtools sort -o test.sorted.bam -T test.tmp This is the log for minimap2:

135504aab317cf37c5d6d478b96a4ef
hasindu2008 commented 7 months ago

Could you open an issue with this log at https://github.com/hasindu2008/f5c/issues as this more relevant there now.