Open jts opened 9 months ago
No, barcodes are not taken into account in pairing. Although I don't see that this would matter, what's the specific problem you're trying to solve?
I'm working on a specific application where the input library is low-ish complexity (not as bad as an amplicon, but not as good as WGS) so worried about the rate of false pairing. I'd like to barcode many such samples together and use the barcodes to reduce the chance of false pairs being called as duplex reads.
I see what you mean. In your case your intuition is correct and you should produce a pairs file and use that, you'll need to write your own script to produce it.
@jts you can also consider the following -
dorado basecaller <model> <pod5> --kit-name <barcode-kit> | dorado demux --no-classify --output-dir
classify and split the dataset.bam
and put it in a read.txt
filedorado duplex <model> <pod5> --read-ids reads.txt
and this will run duplex basecalling only with the read ids from that barcode@jts if you want to generate a pairs file yourself here's how I did it: https://github.com/nanoporetech/dorado/issues/368#issuecomment-1900743490
Great, thanks @tijyojwad and @shenker
Hi,
I don't see why I need to do basecalling twice. Can I first do dorado duplex > bam
and then dorado demux
to demultiplex the bam file into many barcodes folders?
Thanks.
HI @lagphase - that will work for the simplex reads, but will most likely result in all duplex reads getting unclassified since the pairing/duplex algorithm will strip the barcode information
Hi @tijyojwad, thanks for your quick response. Then would you recommend I use --no-trim
when doing simplex basecalling?
@lagphase depends on what you're trying to do -
dorado duplex
is not setup to do any adapter/primer trimming or barcode classification yet. So any reads generated through dorado duplex
are effectively run with --no-trim
.dorado basecaller
and want to barcode post basecalling, please run with --no-trim
.However, keeping the barcodes untrimmed in the simplex reads will still result in duplex reads not having the barcodes just by virtue of how we find overlapping parts of the duplex read. The barcodes may make it past the overlapping stage, but likely not. So a more robust approach until we add duplex barcoding to dorado would be to run what's described here - https://github.com/nanoporetech/dorado/issues/600#issuecomment-1915188395
@tijyojwad that helps! thank you.
@jts you can also consider the following -
- run your dataset through simplex basecalling with barcoding enabled
dorado basecaller <model> <pod5> --kit-name <barcode-kit> | dorado demux --no-classify --output-dir
classify and split the dataset- then fetch the read ids per barcode from the corresponding
.bam
and put it in aread.txt
file- run
dorado duplex <model> <pod5> --read-ids reads.txt
and this will run duplex basecalling only with the read ids from that barcode
Just to clarify, when carrying out step 1 with dorado base caller should the --no-trim option be added? As you haven't written it in the code, but on the GitHub page it recommends using --no-trim if you want to demultiplex later so I'm a bit confused on the correct way to proceed
Hi,
Will the current version of dorado take barcodes into account when identifying duplex pairs, or does this need to be done separately and provided with the
--pairs
option todorado duplex
?Thanks