Closed koh-joshua closed 8 months ago
hello @koh-joshua
It seems that swarm's default output corresponds to what you need:
printf ">seq1_100\nAA\n>seq2_400\nAC\n>seq3_10\nGG\n" | \
swarm 2> /dev/null
seq2_400 seq1_100
seq3_10
If you don't want clusters to printed on your terminal, you can redirect or name an output file with this command-line option:
-o, --output-file
output clustering results to filename. Results consist of a list of clusters, one cluster per line. A cluster is a list of amplicon headers separated by spaces. That output format can be modified by the option --mothur (-r). Default is to write to standard output.
Thank you so much! Thank you for VSEARCH and SWARM!
Thanks @koh-joshua
This was already covered indirectly by our test suite, but I've added three specific tests for completeness (https://github.com/frederic-mahe/swarm-tests/commit/f0b5b734abb2fa240e2fa17a286e7b1c69643339)
Is there an option to simply output members (sequence id) of a cluster to a file? At the moment, looks like the only way to know what are the member/sequences within a cluster is to output all fasta sequences within the cluster?
For example, if seq1_100 and seq2_400 are both members of cluster 1, how can I retrieve the ids (seq1_100 and seq2_400) from the cluster? It's easy to map the sequence id back to their fasta sequence.
I believe this is similar to what VSEARCH generates as one of the outputs.