Closed FatYuanBao closed 1 month ago
Is there some other setting to contamination detaction? I ran a dataset with known DCS contamination through chopper and then through minimap directly. Chopper for my particular dataset let almost 1000 reads through that still map to DCS (over the whole length), while minimap2 directly with the lr:hq preset only let a handful of short reads through. At this point it is still letting too many DCS reads pass for my comfort.
My commands respectively:
chopper -i Ar15CUR005-high-sugar.simplex.fastq.gz --contam DCS.fasta --threads 16 > chopper.fastq
minimap2 -d dcs.mmi DCS.fasta && minimap2 -t 16 -ax lr:hq dcs.mmi Ar15CUR005-high-sugar.simplex.fastq.gz | samtools view -@ 16 -b -f4 - | samtools fastq -@ 16 - > minimap.fastq
Chopper cleaned reads mapped to DCS:
Minimap2 + samtools:
Hi @JWDebler, I wonder what flowcell was used to sequenced Ar15CUR005-high-sugar.simplex.fastq.gz
? Is it R9
or R10
? And also for the users using PacBio CCS, the mode lr:hq
is not good.
R10
Hi everyone,
Thanks for your feedback.
@FatYuanBao: yes, your concern is justified. I however haven't evaluated how the alignment is for multiple chemistry versions and sequencers, with this preset. It is not because it is not optimal that it doesn't work at all... but further testing would be good. Regardless of what the companies will tell you, the differences between R10 ONT and PacBio CCS are not huge, anymore. Note that the lr:hq preset in chopper is only there since v0.8.0
@JWDebler: chopper is eliminating reads that are at least 90% aligned to the contaminating sequence (DCS then). Could that explain what you observe? I don't want to eliminate sequences that have only a short alignment between query and contaminant. I think false positives are worse than false negatives here, but that will depend on your application...
In general, while circumstances may warrant different parameters, I want to avoid to create a tool with too many options :-)
Best, Wouter
Yep, that probably explains it. I will stick with Minimap for removal then because I have seen DCS reads ending up as parts of contigs during assembly or as standalone contigs. Maybe you could expose this as another option, but it's not that urgent. I just pipe my decontaminated reads into chopper for quality and length filtering.
And you mean exposing the alignment percentage as an option, or the alignment preset?
It is not because it is not optimal that it doesn't work at all...
Yeah, that is really the point, but I still think using preset map-ont
is better at this time for
lr:hq
is still in development, which may change later.map-ont
works good on R9
nanopore data and many paper even use map-ont
for R10
nanopore data. In my case the percentage, others might be interested in the preset. But I agree that one of the strengths of chopper is its limited set of features and you can always just pipe your personal preferred Minimap results straight into it.
I will just close this issue first since --contam
was not the core of chopper
.
Hi developers, I think there is a small problem with
--contam
. In themain.rs
line 235-242,minimap2
was deployed for alignment, but with theLrHq
preset.The
LrHq
preset is specifically for ONTR10
Q20 data, which is not every case for the users usingchopper
. AlthoughR9
flowcells were obsolete, the majority of ONT data was sequenced withR9
flowcells. I suggest let the user to inputR9
orR10
(maybe add a parameter --flowcell [R9]/[R10]) to determine to use the presetLrHq
ormap-ont
.