torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

Put the cluster seed identifier in the --seeds output #92

Closed a1an77 closed 7 years ago

a1an77 commented 7 years ago

I noticed the output of the --seeds option does not output any sequence identifier for the cluster representatives. It would be very useful to have such and identifier for backtracking purposes and to couple the sequence back to the "cluster representative" identifier.

example output:

_20 tacggg.... _10 ggcggg...

Desired ouput:

cluster_rep_idA_20 tacggg.... cluster_rep_idB_10 ggcggg...

In the meantime I assume the only way to get the identifier is to rely on the fact that the seeds file has exactly the same order of the cluster file (main output) and the two can be read pairwise one record after the other. Is the assumption correct?

frederic-mahe commented 7 years ago

Hi @a1an77

swarm normally outputs sequence identifiers for the cluster representatives. Maybe there is a problem with your identifiers? could you please paste some of the identifiers you use?

Thanks,

frederic-mahe commented 7 years ago

I am going to assume this was a problem with @a1an77 's data. I am closing that issue.