Closed andand closed 1 year ago
Thanks for that info. I see annotations ending with "_sp." in PR2 v5.0.0 and v4.14.0. I could modify the assignSpecies input when using any PR2 version to not include any sequences ending with "_sp.". That would affect all versions. Let me know if you disagree.
edit: that could be done by | awk '!/ sp.\n/' RS=">" ORS=">"
(remove sequences of names that end with sp.
) in bin/taxref_reformat_pr2.sh
.
Sounds good!
@jtangrot Do you agree to remove for assignSpecies all annotations ending with sp.
? I am asking because it seems valid to me but I am not really into taxonomic databases and would welcome another opinion.
I agree, but it should be noted that I work close to Anders (andand), so my opinion is a bit biased in this case...
Thanks, I see :)
Would any of you like to review #599 ? Its just what we discussed here, tiny change.
Merged, will be in next release!
Description of feature
According to this issue in PR2 regarding assigning species, sequences with annotation ending with "_sp." may actually belong to properly named species of the same genus (but the data provider may have failed to define them at species level). If these are included when running assignSpecies one may therefor get seemingly multi-species-matching ASVs, although they in fact only match one species. It may thus be a good idea to remove reference sequences with annotation ending with "_sp." before running assignSpecies.