Closed dustin-cram closed 11 months ago
Hi @dustin-cram - Thanks for reporting this, we are looking into this issue.
Hi @dustin-cram - we have identified the source of this issue and resolved it internally. We will release a fix very soon.
Thanks for the quick fix @vellamike. I look forward to the release.
the fix is now available on GitHub master
branch. we'll create a new build and release in a couple of days (in case other major issues show up which need to be fixed as well).
In the meantime, you can try out the fix in this release candidate build - https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.4.1-rc1-linux-x64.tar.gz
In the meantime, you can try out the fix in this release candidate build - https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.4.1-rc1-linux-x64.tar.gz
That works for me.
I'll leave the issue open until 0.4.1 is released to make the issue more visible to others.
Hi, I've actually been wondering this for a while.
Can you elaborate on what exactly is a simplex read with duplex offspring? And how are they listed in the current 0.4.0 release then? Are they tagged with dx:i:1?
Hi @diego-rt - here are details on the tags - https://github.com/nanoporetech/dorado#duplex
what exactly is a simplex read with duplex offspring
This is a simplex read whose duplex pair was detected in the dataset, so dorado was able to call a duplex read for that pair.
Thanks for the info @tijyojwad
So if I understand correctly, this means that for each duplex read with tag dx:i:1, there are two simplex reads with tag dx:i:-1 ?
If so, I would suggest that either the dx:i:-1 reads shouldn't be emitted by default, or it should be clearer documented that not filtering out dx:i:-1 reads will result in essentially 3 reads being emmited for the same DNA molecule.
Hi @diego-rt -
for each duplex read with tag dx:i:1, there are two simplex reads with tag dx:i:-1
yes, in the perfect scenario. you can also derive the parent simplex reads from the duplex read id (read_1;read_2
)
Yes certainly we will document the output characteristics more explicitly. From a basecaller perspective it's better to output all the data and mark their source clearly. Then it's up to the downstream tools to determine how to filter/use it.
If one would select simplex reads without duplex offsprings using the dx:i:0
and select duplex reads using the dx:i:1
flag, isn't that the same as filtering out reads with the dx:i:-1
?
Also, was wondering if the identified issue in current 0.4.0 release completely misses out on simplex reads with duplex offsprings or are they somehow incorporated into one of the other two flags?
If one would select simplex reads without duplex offsprings using the dx:i:0 and select duplex reads using the 'dx:i:1
flag, isn't that the same as filtering out reads with the
dx:i:-1` ?
yes correct
0.4.0 release completely misses out on simplex reads with duplex offsprings or are they somehow incorporated into one of the other two flags
it folds all simplex reads into dx:0
regardless of whether they have duplex offsprings or no. No read data is thrown away, just the tags are incorrect.
@tijyojwad thanks for the clarification.
We are processing direct cDNA transcriptomic data and using both dx:i:0
and dx:i:1
will lead to over representation of simplex reads with duplex offsprings. I have been waiting for this release for it's ability to split concatenated reads because we found a great amount of concatenated reads in our dataset after running previous dorado releases. I will try dorado-0.4.1-rc1.
Thanks for the help.
Dorado v0.4.1 was just released with the bug fix.
Hi,
With 0.4.0 I no longer see any SAM records with the tag dx:i:-1. Is there any way to identify these simplex reads with duplex offspring? I would generally prefer to discard these and use only the duplex read.