Closed colinbrislawn closed 4 years ago
I'm gamely interested in working on this issue.
Should we just change this to True by default and not expose the setting?
I would not recommend this because the behavior is non-robust and will probably lead to errors if the id lines are in non-standard-illumina format, even slight modifications can break the parsing.
This is definitely a useful flag when paired Illumina fastqs have been inconsistently modified upstream, but it is a bit of a niche case. I wonder, is there a Q2 plugin focused on filtering fastq files at this point? Something like this functionality would fit there. But adding a flag to denoise-paired
would probably be OK too.
We are currently thinking that https://github.com/qiime2/q2-types/issues/207 is perhaps a better way to address this issue in 99% of the cases where it will come up (we're not aware of anything that shuffles the reads, so we're assuming that it's a result of filtering in a non-paired-end aware way). These read mismatches cause issues elsewhere, so it would be best to just disallow them in the format.
Now that importing validates the entire contents of the archive, we think we can realistically prevent this situation.
Feel free to re-open if there's other reasons that matchIDs would be useful.
Improvement Description Expose the
matchIDs
argument of thefilterAndTrim()
functionCurrent Behavior The
matchIDs
argument of thefilterAndTrim()
function isFalse
by default.Proposed Behavior A flag would be added to qiime2 so users could optionally set the
matchIDs
argument of thefilterAndTrim()
toTrue
Questions Should we just change this to
True
by default and not expose the setting?References User story: https://forum.qiime2.org/t/mismatched-forwward-and-reverse-sequence-files-sequences-imported-by-casava-1-8-paired-end-demultiplexed-fastq-format/10181/2