duplex and barcoding - Githubissues

jts commented 9 months ago

Hi,

Will the current version of dorado take barcodes into account when identifying duplex pairs, or does this need to be done separately and provided with the --pairs option to dorado duplex?

Thanks

vellamike commented 9 months ago

No, barcodes are not taken into account in pairing. Although I don't see that this would matter, what's the specific problem you're trying to solve?

jts commented 9 months ago

I'm working on a specific application where the input library is low-ish complexity (not as bad as an amplicon, but not as good as WGS) so worried about the rate of false pairing. I'd like to barcode many such samples together and use the barcodes to reduce the chance of false pairs being called as duplex reads.

vellamike commented 9 months ago

I see what you mean. In your case your intuition is correct and you should produce a pairs file and use that, you'll need to write your own script to produce it.

tijyojwad commented 9 months ago

@jts you can also consider the following -

run your dataset through simplex basecalling with barcoding enabled dorado basecaller <model> <pod5> --kit-name <barcode-kit> | dorado demux --no-classify --output-dir classify and split the dataset
then fetch the read ids per barcode from the corresponding .bam and put it in a read.txt file
run dorado duplex <model> <pod5> --read-ids reads.txt and this will run duplex basecalling only with the read ids from that barcode

shenker commented 9 months ago

@jts if you want to generate a pairs file yourself here's how I did it: https://github.com/nanoporetech/dorado/issues/368#issuecomment-1900743490

jts commented 9 months ago

Great, thanks @tijyojwad and @shenker

lagphase commented 9 months ago

Hi,

I don't see why I need to do basecalling twice. Can I first do dorado duplex > bam and then dorado demux to demultiplex the bam file into many barcodes folders?

Thanks.

tijyojwad commented 9 months ago

HI @lagphase - that will work for the simplex reads, but will most likely result in all duplex reads getting unclassified since the pairing/duplex algorithm will strip the barcode information

lagphase commented 9 months ago

Hi @tijyojwad, thanks for your quick response. Then would you recommend I use --no-trim when doing simplex basecalling?

tijyojwad commented 9 months ago

@lagphase depends on what you're trying to do -

dorado duplex is not setup to do any adapter/primer trimming or barcode classification yet. So any reads generated through dorado duplex are effectively run with --no-trim.
If you are running dorado basecaller and want to barcode post basecalling, please run with --no-trim.

However, keeping the barcodes untrimmed in the simplex reads will still result in duplex reads not having the barcodes just by virtue of how we find overlapping parts of the duplex read. The barcodes may make it past the overlapping stage, but likely not. So a more robust approach until we add duplex barcoding to dorado would be to run what's described here - https://github.com/nanoporetech/dorado/issues/600#issuecomment-1915188395

lagphase commented 9 months ago

@tijyojwad that helps! thank you.

luckybillion commented 3 months ago

@jts you can also consider the following -

run your dataset through simplex basecalling with barcoding enabled dorado basecaller <model> <pod5> --kit-name <barcode-kit> | dorado demux --no-classify --output-dir classify and split the dataset

then fetch the read ids per barcode from the corresponding .bam and put it in a read.txt file

run dorado duplex <model> <pod5> --read-ids reads.txt and this will run duplex basecalling only with the read ids from that barcode

Just to clarify, when carrying out step 1 with dorado base caller should the --no-trim option be added? As you haven't written it in the code, but on the GitHub page it recommends using --no-trim if you want to demultiplex later so I'm a bit confused on the correct way to proceed

nanoporetech / dorado

duplex and barcoding #600