nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Why I had very low Duplex rate? #657

Closed sidi-yang closed 6 months ago

sidi-yang commented 6 months ago

Dear group,

I have just done duplex base calling and produced duplex bam file and I have output like this, with a very low duplex rate (10%)(I have no idea why this rate is so low):

[2024-02-28 11:30:42.239] [info] > No duplex pairs file provided, pairing will be performed automatically [2024-02-28 11:31:17.824] [info] - downloading dna_r10.4.1_e8.2_4khz_stereo@v1.1 with httplib [2024-02-28 11:31:38.843] [info] - set batch size for cuda:0 to 704 [2024-02-28 11:31:42.365] [info] - set batch size for cuda:0 to 1216 [2024-02-28 11:31:42.365] [info] > Starting Stereo Duplex pipeline [2024-02-28 11:31:42.433] [info] > Reading read channel info [2024-02-28 11:31:43.885] [info] > Processed read channel info [2024-02-29 00:37:15.418] [info] > Simplex reads basecalled: 605389 [2024-02-29 00:37:15.459] [info] > Simplex reads filtered: 2 [2024-02-29 00:37:15.459] [info] > Duplex reads basecalled: 37607 [2024-02-29 00:37:15.460] [info] > Duplex rate: 10.230203% [2024-02-29 00:37:15.519] [info] > Basecalled @ Bases/s: 1.264310e+05

I'm just wondering does this 10% Duplex rate means that only 10% of all the sequences are called in double strands and 90% of all the sequences are still called in single strand? And what duplex rate is of good quality for further analysis?

Thank you very much!

vellamike commented 6 months ago

Hi @sidi-yang , are you running with high duplex flowcells, or standard flowcells?

sidi-yang commented 6 months ago

Ahh I used standard flowcells! Is that why I had a very low duplex rate?

sidi-yang commented 6 months ago

And as I used standard flowcells, should I use simplex base calling?

vellamike commented 6 months ago

For non-HD flowcells those duplex rates are within expectations. You don't need to use simplex basecalling as the non-duplex pairs will be output anyway, but you won't get very high rates with your flowcells.

sidi-yang commented 6 months ago

Thank you about that:) And just wondering what duplex rate is of good quality for further analysis?

Thank you!

vellamike commented 6 months ago

Thank you about that:) And just wondering what duplex rate is of good quality for further analysis?

Thank you!

This really depends what you're doing. In general we only recommend duplex for specific situations like de novo assembly or metagenomic workflows. For most applications like SNP calling simplex basecalling is more than sufficient.

sidi-yang commented 6 months ago

I feel so supported! And thank you very much for answering my questions!!!!

aljazdzy commented 3 months ago

Thank you about that:) And just wondering what duplex rate is of good quality for further analysis? Thank you!

This really depends what you're doing. In general we only recommend duplex for specific situations like de novo assembly or metagenomic workflows. For most applications like SNP calling simplex basecalling is more than sufficient.

Would you be able to elucidate on why duplex would be better for metagenomic scenarios? I'm working a metagenomic workflow and am admittedly focused on more of the rare organisms- but I would think that higher throughput would be more performant for metagenomes. Unless that wouldn't be the case for HD flowcells? (I too have just been using standard, our lab just has a Mk B.)