nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Number of classified reads significantly differ between ont-dorado-server and dorado standalone #659

Closed billytcl closed 3 months ago

billytcl commented 6 months ago

Issue Report

Please describe the issue:

I am using the Native Barcoding 96 kit with R10.4.1 5kHz, and I'm observing a huge difference between ont-dorado-server and dorado standalone. It's ~10-20% difference with more reads classified with ont-dorado-server and more being unclassified with dorado standalone. I am using a different methylation model but I highly doubt that's the culprit. I'm thinking there may be a weird interaction with read splitting that's different with standalone dorado?

ont-dorado-server v7.0.8

ont_basecall_client -p 5555 -i pod5/ -s dorado_5mc_5khz_prom_pod5/ -c dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg --recursive --compress_fastq --barcode_kits EXP-NBD196 --align_ref hs38_naa.mmi --bam_out --min_qscore 7 --do_read_splitting --max_read_split_depth 4 --index --progress_stats_frequency 1000 --read_batch_size 200000

samtools flagstat -@ 20 ../bam_sup/P10574_27431.D01_control.merged.sorted.bam
3143504 + 0 in total (QC-passed reads + QC-failed reads)
2542461 + 0 primary
515667 + 0 secondary
85376 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2701131 + 0 mapped (85.93% : N/A)
2100088 + 0 primary mapped (82.60% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

dorado 0.5.3

dorado basecaller sup,5mC_5hmC Seq_Output/20231003_1702_2F_PAO83072_97d6f8ad/ --recursive --min-qscore 7 --kit-name EXP-NBD196 --reference hs38_naa.mmi

samtools flagstat -@ 20 P10574_27431.D01_control.merged.sorted.bam
2679896 + 0 in total (QC-passed reads + QC-failed reads)
2120676 + 0 primary
467276 + 0 secondary
91944 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2426897 + 0 mapped (90.56% : N/A)
1867677 + 0 primary mapped (88.07% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Run environment:

tijyojwad commented 6 months ago

Hi @billytcl - can you try to run the dorado basecaller command with then --no-trim option? I suspect that adapter trimming may be getting in the way of barcode classification, so running with no-trim will help validate that theory

billytcl commented 6 months ago

Sure thing! Will give it a shot. If it does interfere with barcoding shouldn’t it then be off by default when barcoding is on?

On Mon, Mar 4, 2024 at 6:58 PM Joyjit Daw @.***> wrote:

Hi @billytcl https://github.com/billytcl - can you try to run the dorado basecaller command with then --no-trim option? I suspect that adapter trimming may be getting in the way of barcode classification, so running with no-trim will help validate that theory

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/659#issuecomment-1977864887, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPHYTZ53CCELSSP2Z77ASLYWUYFXAVCNFSM6AAAAABEAXMAPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXHA3DIOBYG4 . You are receiving this because you were mentioned.Message ID: @.***>

tijyojwad commented 6 months ago

Hi @billytcl - I want to confirm that's indeed the case for your situation. We have some improvements along that line internally for the next release already, and your input would help determine if more changes are needed.

billytcl commented 6 months ago

Ok! This may take a few days for the run to finish basecalling.

ezherman commented 4 months ago

@billytcl did you get a chance to give this a go? I am interested in your result too!

tijyojwad commented 3 months ago

Closing due to inactivity. FYI @billytcl we have improved the interplay between adapter trimming and barcoding within dorado since 0.6.0 release, so this should be less of a problem now