Closed mikerobeson closed 2 years ago
Shall we also rename the semantic type JoinedSequencesWithQuality
?
Vsearch uses 'joined' to mean 'concatenated', while our semantic types means 'overlapped/merged'. We have an opportunity here to reduce ambiguity and use descriptive words, but that change may breaks things.
Do we want to support 'concatenated' reads as a new semantic type?
EDIT: While this type is defined here, it's used in many different repos. This may be hard to change. Also, there is precedent for calling Illumina paired-end overlapping 'joining' in Qiime1 and ea-utils.
What shall we name the new semantic type:
ConcatenatedSequencesWithQuality
DiscontiguousSequencesWithQuality
GappySequencesWithQuality
I would support changing the action name, but do not think that it would be a good idea to change the name name of the Semantic Type, as it would cause breaking changes to various repos and could lead to more confusion.
While I agree that consistency with vsearch would be nice, the semantic type is not explicitly tied to the vsearch action, and the terminology is not really standardized across different tools (just as merge and concatenate have ambiguous meanings, so are equally bad or worse to use here).
I am leaning towards @nbokulich's suggestions. That is, rename the action as I suggested.
For the Types, we can simply note in the documentation that JoinedSequencesWithQuality
is a catch-all for the Merged
, Concatenated
, Discontiguous
, and Gappy
types of data. If that ever comes up... Unless these are types that we need to explicitly deal with that I am unaware of through other plugins?
OK! So we keepJoinedSequencesWithQuality
for the overlapped Illumina reads we have now.
Do we we want to introduce a new type for Illumina reads that have been combined without overlap (with optional Ns padding the middle)?
Or do we want to use JoinedSequencesWithQuality
for all of these, including the output of this?
vsearch --fastq_join --join_padgapq NNN
Do we we want to introduce a new type for Illumina reads that have been combined without overlap (with optional Ns padding the middle)? Or do we want to use JoinedSequencesWithQuality for all of these, including the output of this? vsearch --fastq_join --join_padgapq NNN
I guess it depends on what this all relates to... Are there plans to generate all of these outputs (i.e. the padding example) in upcoming QIIME 2 tools that we should be keeping track of? If so, then it might be worth further discussion. Otherwise, I'd suggest keeping with JoinedSequencesWithQuality
as the default catch-all in the short term. 🤷♂️
EDIT: While this type is defined here, it's used in many different repos. This may be hard to change. Also, there is precedent for calling Illumina paired-end overlapping 'joining' in Qiime1 and ea-utils.
I named the script join_paired_ends.py
, based on the name of one of the wrapped tools: fastq-join
. I've regretted that ever since I learned there was a difference between merging and joining. 🤦♂️
Otherwise, I'd suggest keeping with JoinedSequencesWithQuality as the default catch-all in the short term. 🤷♂️
I like it! We can add more types when we have more functions.
If that's the plan, then this PR https://github.com/qiime2/q2-vsearch/pull/87 is ready for review.
Suggestion: Rename the q2-vsearch
join-pairs
action tomerge_pairs
, and update the associated descriptions to state something like: "Merge paired-end sequence reads..."). Merging reads is not the same as joining reads.Reason: Calling this QIIME 2 action
join-pairs
may be confusing for savvy vsearch users, as there is in fact a--fastq_join
command within vsearch. This command will stitch each read together end-to-end with a padding string, usually Ns. As we're actually calling the vsearch--fastq_mergepairs
, we should update the name of the action.