I use vsearch usearch_global --self a lot on a 16s database and it works great. But sometimes I get a sequence match to a 16s allele from the same genome as my query sequence. A --selfmap filename argument would be great to group sequences by accession or genome or whatever so they couldn't match each other. The file could be a headerless csv: _label,genomelabel. One additional specification would be label could appear multiple times in a map fle to multiple _genomelabel groups. In other words, a sequence could be a member of multiple genomes which is weird but also useful.
I use
vsearch usearch_global --self
a lot on a 16s database and it works great. But sometimes I get a sequence match to a 16s allele from the same genome as my query sequence. A--selfmap filename
argument would be great to group sequences by accession or genome or whatever so they couldn't match each other. The file could be a headerless csv: _label,genomelabel. One additional specification would be label could appear multiple times in a map fle to multiple _genomelabel groups. In other words, a sequence could be a member of multiple genomes which is weird but also useful.Thanks!