Closed sidi-yang closed 6 months ago
Hi @sidi-yang , are you running with high duplex flowcells, or standard flowcells?
Ahh I used standard flowcells! Is that why I had a very low duplex rate?
And as I used standard flowcells, should I use simplex base calling?
For non-HD flowcells those duplex rates are within expectations. You don't need to use simplex basecalling as the non-duplex pairs will be output anyway, but you won't get very high rates with your flowcells.
Thank you about that:) And just wondering what duplex rate is of good quality for further analysis?
Thank you!
Thank you about that:) And just wondering what duplex rate is of good quality for further analysis?
Thank you!
This really depends what you're doing. In general we only recommend duplex for specific situations like de novo assembly or metagenomic workflows. For most applications like SNP calling simplex basecalling is more than sufficient.
I feel so supported! And thank you very much for answering my questions!!!!
Thank you about that:) And just wondering what duplex rate is of good quality for further analysis? Thank you!
This really depends what you're doing. In general we only recommend duplex for specific situations like de novo assembly or metagenomic workflows. For most applications like SNP calling simplex basecalling is more than sufficient.
Would you be able to elucidate on why duplex would be better for metagenomic scenarios? I'm working a metagenomic workflow and am admittedly focused on more of the rare organisms- but I would think that higher throughput would be more performant for metagenomes. Unless that wouldn't be the case for HD flowcells? (I too have just been using standard, our lab just has a Mk B.)
Dear group,
I have just done duplex base calling and produced duplex bam file and I have output like this, with a very low duplex rate (10%)(I have no idea why this rate is so low):
[2024-02-28 11:30:42.239] [info] > No duplex pairs file provided, pairing will be performed automatically
[2024-02-28 11:31:17.824] [info] - downloading dna_r10.4.1_e8.2_4khz_stereo@v1.1 with httplib
[2024-02-28 11:31:38.843] [info] - set batch size for cuda:0 to 704
[2024-02-28 11:31:42.365] [info] - set batch size for cuda:0 to 1216
[2024-02-28 11:31:42.365] [info] > Starting Stereo Duplex pipeline
[2024-02-28 11:31:42.433] [info] > Reading read channel info
[2024-02-28 11:31:43.885] [info] > Processed read channel info
[2024-02-29 00:37:15.418] [info] > Simplex reads basecalled: 605389
[2024-02-29 00:37:15.459] [info] > Simplex reads filtered: 2
[2024-02-29 00:37:15.459] [info] > Duplex reads basecalled: 37607
[2024-02-29 00:37:15.460] [info] > Duplex rate: 10.230203%
[2024-02-29 00:37:15.519] [info] > Basecalled @ Bases/s: 1.264310e+05
I'm just wondering does this 10% Duplex rate means that only 10% of all the sequences are called in double strands and 90% of all the sequences are still called in single strand? And what duplex rate is of good quality for further analysis?
Thank you very much!