nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
452 stars 53 forks source link

DCS (DNA control sample) removal #717

Closed selmapichot closed 3 months ago

selmapichot commented 3 months ago

Hi, is there a way to remove the reads corresponding to the DCS during the basecalling and/or alignment on POD5 files ? Many thanks.

JWDebler commented 3 months ago

Not sure, but I use chopper as part of my assembly pipeline to remove it from the fastqs, just in case someone actually used DCS without telling me about it.... zcat reads.fastq.gz | chopper -t 16 --contam DCS.fasta -q 10 -l 1000 | pigz -9 > reads.nodcs.fastq.gz

DCS.fasta

MarkBicknellONT commented 3 months ago

Hi @selmapichot ,

Dorado doesn't have a built in filter for DCS or RCS strands, but you can also use dorado aligner align your reads to the calibration strand sequence and then filter out any hits using samtools, by running samtools view --incl-flags 0x4 <bam file>.

The reference sequences for DNA and RNA calibration strands are outlined here: https://help.nanoporetech.com/en/articles/6632934-what-is-dna-cs-dcs https://help.nanoporetech.com/en/articles/6632031-what-is-rna-cs-rcs

Kind regards, Mark

selmapichot commented 3 months ago

Many thanks Mark for your reply. Do I need to filter out the DCS reads at all? According to nanopore, I just need to align the reads to the reference genome, and I can continue working with the resulting bam as usual... Is this correct ?

MarkBicknellONT commented 3 months ago

Hi @selmapichot,

Yes that's correct, if you've aligned to a different reference then the lambda DCS reads will end up unclassified, and you can simply ignore them.

Kind regards, Mark

selmapichot commented 3 months ago

Many thanks Mark for your reply.

All the best, Selma.