Closed lucyintheskyzzz closed 1 month ago
Hi @lucyintheskyzzz - what you're running is demuxing "in-line" since it's happening as part of the basecaller
command. the subsequent demux --no-classify
is simply splitting the BAM into per barcode BAMs.
if you were to run basecalling first, and then run demux
cmd to classify the reads with --kit-name
that would be demuxing as a separate step.
Hi @tijyojwad I ended up using —no-trim and it worked for the demux and I had way more reads and less unclassified. Now I am wondering what tool you recommend for post demux processing to chop off the barcodes and adapters and filter out low quality reads? Fastp? Porechop (discontinued since 2018)? Thanks!
If the option --no-trim
is not specified in dorado demux
, does it mean that barcodes and adapters will be automatically removed during classification (= demultiplexing)?
@ireneortega I was told that it does automatically chop barcodes and adapters during demux, but I definitely want to double check. I ended up using fastp anyways to get rid of all the adapters and barcodes and qscore <15.
I am re-reading the dorado Github page and I noticed this:
"If adapter/primer trimming is done in-line with basecalling in combination with demultiplexing, then the software will automatically ensure that the trimming of adapters and primers does not interfere with the demultiplexing process. However, if you intend to do demultiplexing later as a separate step, then it is recommended that you disable adapter/primer trimming when basecalling with the --no-trim option, to ensure that any barcode sequences remain completely intact in the reads."
I am running this on Loni HPC GPU cluser: 8 GPU Compute Nodes, each with: Two 24-core Intel Cascade Lake (Intel® Xeon® Platinum 8260 Processor) CPUs. 192 GB memory 600 GB HDD 2 NVIDIA Volta V100 GPU's
Here is an example of how I am running my code:
ONR012021
/work/kvigil/Programs/dorado-0.6.1-linux-x64/bin/dorado basecaller /work/kvigil/Programs/dorado-0.6.1-linux-x64/bin/dna_r9.4.1_e8_hac@v3.3 --recursive /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5 --kit-name SQK-PBK004 > /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5/hac/ONR012021.calls.bam
demux
/work/kvigil/Programs/dorado-0.6.1-linux-x64/bin/dorado demux --output-dir /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5/hac/barcodes --no-classify /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5/hac/ONR012021.calls.bam
Is this considered "demultiplexing as a separate step"? Do I need to add the "--no-trim" during by basecalling step so dorado will correctly demultiplex my .bam file in the correct barcode files, so I am not stuck with alot of unclassified reads?
Thanks! Katie