nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
441 stars 54 forks source link

Question about "--no-trim" option for basecalling and demux #825

Closed lucyintheskyzzz closed 1 month ago

lucyintheskyzzz commented 1 month ago

I am re-reading the dorado Github page and I noticed this:

"If adapter/primer trimming is done in-line with basecalling in combination with demultiplexing, then the software will automatically ensure that the trimming of adapters and primers does not interfere with the demultiplexing process. However, if you intend to do demultiplexing later as a separate step, then it is recommended that you disable adapter/primer trimming when basecalling with the --no-trim option, to ensure that any barcode sequences remain completely intact in the reads."

I am running this on Loni HPC GPU cluser: 8 GPU Compute Nodes, each with: Two 24-core Intel Cascade Lake (Intel® Xeon® Platinum 8260 Processor) CPUs. 192 GB memory 600 GB HDD 2 NVIDIA Volta V100 GPU's

Here is an example of how I am running my code:

ONR012021

/work/kvigil/Programs/dorado-0.6.1-linux-x64/bin/dorado basecaller /work/kvigil/Programs/dorado-0.6.1-linux-x64/bin/dna_r9.4.1_e8_hac@v3.3 --recursive /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5 --kit-name SQK-PBK004 > /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5/hac/ONR012021.calls.bam

demux

/work/kvigil/Programs/dorado-0.6.1-linux-x64/bin/dorado demux --output-dir /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5/hac/barcodes --no-classify /ddnB/work/kvigil/sandiego/ONR012021/ONR012021/pod5/hac/ONR012021.calls.bam

Is this considered "demultiplexing as a separate step"? Do I need to add the "--no-trim" during by basecalling step so dorado will correctly demultiplex my .bam file in the correct barcode files, so I am not stuck with alot of unclassified reads?

Thanks! Katie

tijyojwad commented 1 month ago

Hi @lucyintheskyzzz - what you're running is demuxing "in-line" since it's happening as part of the basecaller command. the subsequent demux --no-classify is simply splitting the BAM into per barcode BAMs.

if you were to run basecalling first, and then run demux cmd to classify the reads with --kit-name that would be demuxing as a separate step.

lucyintheskyzzz commented 3 weeks ago

Hi @tijyojwad I ended up using —no-trim and it worked for the demux and I had way more reads and less unclassified. Now I am wondering what tool you recommend for post demux processing to chop off the barcodes and adapters and filter out low quality reads? Fastp? Porechop (discontinued since 2018)? Thanks!

ireneortega commented 3 days ago

If the option --no-trim is not specified in dorado demux, does it mean that barcodes and adapters will be automatically removed during classification (= demultiplexing)?

lucyintheskyzzz commented 3 days ago

@ireneortega I was told that it does automatically chop barcodes and adapters during demux, but I definitely want to double check. I ended up using fastp anyways to get rid of all the adapters and barcodes and qscore <15.