Open olawa opened 5 months ago
Hi @olawa - what version of dorado are you using?
what(): Row in sample sheet file samplesheet.txt has incorrect number of entries
hmm I don't think this is the intended behavior. You should be able to specify only the desired barcodes. we will look into this
demux to use the kit specified in sample sheet
yes we can look into this
only classify reads present in the run
I'm not sure I understand this... If you're finding read ids that aren't in the input pod5 that's due to read splitting. The parent read ids will be in the pi:Z
tag of that read. If you want to filter on double ended barcode hits, you can also run --barcode-both-ends
.
write reads with more than one of the specified barcodes to a separate file
reads are split within dorado, which should catch most cases. currently if 2 different barcodes are detected on either end, we treat them as unclassified.
option to only split on barcode (if more than one flowcell was used)
dorado demux
does exactly this, right? it'll output a BAM file per barcode. You can combine pod5s/bams from multiple runs and give it to dorado. 0.6.0 onwards you can also give dorado demux
a folder with multiple BAMs in it
Hi @olawa,
That error indicates that one or more of the rows had a different number of entries to the number of column headings. Samples sheets should absolutely work with only a subset of the barcodes from the kit. Note that a sample sheet should be defined with comma-separated variables, and empty columns must still be included. See the documentation here for more information.
Hi @malton-ont @tijyojwad , thanks for the clarification. I got it to work as intended with dorado 0.6 now, could have been an extra comma in the header from converting tabs.
only classify reads present in the run
I meant barcodes. I am sequencing short reads with PBC96, they are amplified so should then have barcodes on both ends. --barcode-both-ends gives much lower classification rate. I am guessing it could be improved if one could exclude only reads were two different barcodes from the list is found, assuming the issue is either poor basecalling at the ends or chimeric ligation products.
Is the dorado split during basecall supposed to split on internal barcode/primers or is it just able to split Minknow chimeras?
One example here where a female sample has alignment to chrY from what appears to be a ligation concatemer. If I have to use guppy (or perhaps pychopper) to split on internal primers that is fine but I cant't find any documantation on it.
Is the dorado split during basecall supposed to split on internal barcode/primers or is it just able to split Minknow chimeras?
It doesn't split on internal barcodes, just on sequencing adapters. So it'll catch chimeras, but not ligation concatemers.
I believe guppy did have an option to split on internal barcodes. We can look into adding that to dorado in a subsequent release.
I am trying demux with sample sheet and get the following error when only the barcodes in use are included:
what(): Row in sample sheet file samplesheet.txt has incorrect number of entries
It took a while to figure out that all 12 barcodes need to be present in the sheet. Perhaps you could add a few example sheets to the repo.
What I would like to be able to do is:
Is any of this possible with the current demux?