qiime2 / q2-dada2

QIIME 2 plugin wrapping DADA2
BSD 3-Clause "New" or "Revised" License
19 stars 36 forks source link

denoise-*: expose `--collapse-no-mismatch` parameter #92

Open nbokulich opened 6 years ago

nbokulich commented 6 years ago

References forum xref

jakereps commented 6 years ago

I can take this one 👍

jakereps commented 6 years ago

How would we want to implement this? That doesn't appear to be a parameter, but a standalone function within the dada2 package.

collapseNoMismatch <- function(seqtab, minOverlap=20, orderBy="abundance", vec=TRUE, verbose=FALSE)

By expose is it adding a new method to q2-dada2, or having this flag run it using strictly the default parameter values?

ebolyen commented 6 years ago

I suspect the latter is what we're looking for, since it sounds like the forum user was interested in combining subsequences together. I'm a little curious as to what minOverlap is for, since it seems like we're dealing with +- a few nucleotides, so having a subsequence of only 20ish sounds pretty bad (and will probably bin into the most abundant read no matter what).

ebolyen commented 6 years ago

That being said, a new method would let us re-use this elsewhere...

nbokulich commented 6 years ago

that forum post seemed to suggest that this was a parameter, so that was my assumption when I opened this issue without further investigation.

I like the new method idea though.

jakereps commented 6 years ago

I'm not sure the best way to handle adding the function inline w/ denoise, as it'd require the flag to run it or not and also a flag for the overlap parameter - assuming all the other default values are fine to use. I'll start with just the function I suppose. Would joining them have to be a pipeline then? It could always be added inline to denoise as well and keep this as a separate function. ¯\_(ツ)_\/¯

thermokarst commented 6 years ago

I'm not sure the best way to handle adding the function inline w/ denoise, as it'd require the flag to run it or not and also a flag for the overlap parameter

I think this would be conditionally run in the R scripts, right (like the chimera checking section)?

benjjneb commented 6 years ago

To "qiime" in, if this is something that is going to be integrated into the workflow commands, this may be best left until we migrate the plugin to the version 1.8 of the dada2 R package, as I am expecting to overhaul the q2-dada2 R scripts a bit at that time.

p.s. we are waiting on bioconda updates for that migration to happen: https://github.com/qiime2/q2-dada2/issues/85#issuecomment-409260897

fconstancias commented 4 years ago

If I am not wrong, the 100% clustering of ASV can also be performed using q2-vsearch plugin ? So maybe there is no need to add a specific condition in q2-dada2 script? What do you think?

ebolyen commented 4 years ago

I'm a bit uncertain if our implementation of de-novo will cluster shorter sequences or not. Also, the IDs are something to consider, but I don't see why these couldn't be made to be very compatible.