qiime2 / q2-vsearch

vsearch plugin for QIIME 2
BSD 3-Clause "New" or "Revised" License
6 stars 20 forks source link

Rename `join-pairs` to `merge-pairs` #84

Closed mikerobeson closed 2 years ago

mikerobeson commented 2 years ago

Suggestion: Rename the q2-vsearch join-pairs action to merge_pairs, and update the associated descriptions to state something like: "Merge paired-end sequence reads..."). Merging reads is not the same as joining reads.

Reason: Calling this QIIME 2 action join-pairs may be confusing for savvy vsearch users, as there is in fact a --fastq_join command within vsearch. This command will stitch each read together end-to-end with a padding string, usually Ns. As we're actually calling the vsearch --fastq_mergepairs, we should update the name of the action.

colinbrislawn commented 2 years ago

Shall we also rename the semantic type JoinedSequencesWithQuality?

Vsearch uses 'joined' to mean 'concatenated', while our semantic types means 'overlapped/merged'. We have an opportunity here to reduce ambiguity and use descriptive words, but that change may breaks things.

Do we want to support 'concatenated' reads as a new semantic type?

EDIT: While this type is defined here, it's used in many different repos. This may be hard to change. Also, there is precedent for calling Illumina paired-end overlapping 'joining' in Qiime1 and ea-utils.

colinbrislawn commented 2 years ago

What shall we name the new semantic type:

nbokulich commented 2 years ago

I would support changing the action name, but do not think that it would be a good idea to change the name name of the Semantic Type, as it would cause breaking changes to various repos and could lead to more confusion.

While I agree that consistency with vsearch would be nice, the semantic type is not explicitly tied to the vsearch action, and the terminology is not really standardized across different tools (just as merge and concatenate have ambiguous meanings, so are equally bad or worse to use here).

mikerobeson commented 2 years ago

I am leaning towards @nbokulich's suggestions. That is, rename the action as I suggested.

For the Types, we can simply note in the documentation that JoinedSequencesWithQuality is a catch-all for the Merged, Concatenated, Discontiguous, and Gappy types of data. If that ever comes up... Unless these are types that we need to explicitly deal with that I am unaware of through other plugins?

colinbrislawn commented 2 years ago

OK! So we keepJoinedSequencesWithQuality for the overlapped Illumina reads we have now.

Do we we want to introduce a new type for Illumina reads that have been combined without overlap (with optional Ns padding the middle)? Or do we want to use JoinedSequencesWithQuality for all of these, including the output of this? vsearch --fastq_join --join_padgapq NNN

mikerobeson commented 2 years ago

Do we we want to introduce a new type for Illumina reads that have been combined without overlap (with optional Ns padding the middle)? Or do we want to use JoinedSequencesWithQuality for all of these, including the output of this? vsearch --fastq_join --join_padgapq NNN

I guess it depends on what this all relates to... Are there plans to generate all of these outputs (i.e. the padding example) in upcoming QIIME 2 tools that we should be keeping track of? If so, then it might be worth further discussion. Otherwise, I'd suggest keeping with JoinedSequencesWithQuality as the default catch-all in the short term. 🤷‍♂️

EDIT: While this type is defined here, it's used in many different repos. This may be hard to change. Also, there is precedent for calling Illumina paired-end overlapping 'joining' in Qiime1 and ea-utils.

I named the script join_paired_ends.py, based on the name of one of the wrapped tools: fastq-join. I've regretted that ever since I learned there was a difference between merging and joining. 🤦‍♂️

colinbrislawn commented 2 years ago

Otherwise, I'd suggest keeping with JoinedSequencesWithQuality as the default catch-all in the short term. 🤷‍♂️

I like it! We can add more types when we have more functions.

If that's the plan, then this PR https://github.com/qiime2/q2-vsearch/pull/87 is ready for review.