qiime2 / q2-dada2

QIIME 2 plugin wrapping DADA2
BSD 3-Clause "New" or "Revised" License
19 stars 36 forks source link

denoise-paired: expose matchIDs parameter #118

Closed colinbrislawn closed 4 years ago

colinbrislawn commented 5 years ago

Improvement Description Expose the matchIDs argument of the filterAndTrim() function

Current Behavior The matchIDs argument of the filterAndTrim() function is False by default.

Proposed Behavior A flag would be added to qiime2 so users could optionally set the matchIDs argument of the filterAndTrim() to True

Questions Should we just change this to True by default and not expose the setting?

References User story: https://forum.qiime2.org/t/mismatched-forwward-and-reverse-sequence-files-sequences-imported-by-casava-1-8-paired-end-demultiplexed-fastq-format/10181/2

colinbrislawn commented 5 years ago

I'm gamely interested in working on this issue.

benjjneb commented 5 years ago

Should we just change this to True by default and not expose the setting?

I would not recommend this because the behavior is non-robust and will probably lead to errors if the id lines are in non-standard-illumina format, even slight modifications can break the parsing.

This is definitely a useful flag when paired Illumina fastqs have been inconsistently modified upstream, but it is a bit of a niche case. I wonder, is there a Q2 plugin focused on filtering fastq files at this point? Something like this functionality would fit there. But adding a flag to denoise-paired would probably be OK too.

ebolyen commented 4 years ago

We are currently thinking that https://github.com/qiime2/q2-types/issues/207 is perhaps a better way to address this issue in 99% of the cases where it will come up (we're not aware of anything that shuffles the reads, so we're assuming that it's a result of filtering in a non-paired-end aware way). These read mismatches cause issues elsewhere, so it would be best to just disallow them in the format.

Now that importing validates the entire contents of the archive, we think we can realistically prevent this situation.

Feel free to re-open if there's other reasons that matchIDs would be useful.