Closed JWDebler closed 4 months ago
Hi @JWDebler,
This read is a product of read splitting - dorado automatically generates new read-ids for split reads. You can see that it contains a pi:Z
tag, which contains the read-id of the parent read from which this entry was generated.
Ok, that means since duplex calling can currently not do barcode demultiplexing (and read splitting) I am losing the reads split during the simplex run, as their read IDs don't exist in the pod5 files. I'll have to come up with a workaround to keep them until duplex can do all of that :-)
I am currently working on a little script to automate basecalling and simplex / duplex separation etc.
When testing on a small subset of pod5 files I realised that during simplex calling and barcode demultiplexing I ended up with a few lonely reads for some barcodes. After looking for them in the original pod5 files it turns out those read IDs don't exist in them.
I'm a little confused as to where they come from because the pod5 file that is mentioned in the bam entry does not contain anything similar to that read-id.
Btw, this is on dorado 0.7.1-RC1
dorado basecaller sup -r pod5s/ --min-qscore 10 --kit-name SQK-NBD114-24 > all.bam
dorado demux --output-dir demuxed --no-classify all.bam
samtools view -h demuxed/SQK-NBD114-24_barcode05.bam
It lists
PAS00041_pass_barcode01_0ff99174_982d5e5f_22.pod5
as the source for this read, however:pod 5 view PAS00041_pass_barcode01_0ff99174_982d5e5f_22.pod5 | grep 2bd74582-be31-445a-ac93-a89fa5f3cb97
comes up empty, and so doespod5 view *.pod5 | grep 2bd74582-be31-445a-ac93-a89fa5f3cb97
This read ID does not exist anywhere in the pod5 files used for this test.
There are a few more like this for other barcodes with 1 read each, but granted, it was a tiny test dataset to start with.
Any idea what's going on? Cheers