phglab / ALFATClust

Biological sequence clustering tool with dynamic threshold
GNU General Public License v3.0
23 stars 6 forks source link

Include Single-Member Clusters (and Representative Contigs) in Outputs (and Optional FASTA Extraction) #13

Open poursalavati opened 5 months ago

poursalavati commented 5 months ago

Thanks for the ALFATClust tool!

Currently, the CLUSTER_EVAL_CSV_FILE excludes information about clusters with only one member (single contigs). While the SEQ_CLUSTER_FILE includes all clusters, it's challenging to extract all representative or center contigs for all clusters, including those with single members.

It would be beneficial to have an additional output option that provides: All representative contig IDs (including single-member clusters) The option to include full cluster membership (for consistency)

FASTA Extraction: Ideally, the tool could offer the option to directly extract FASTA sequences of the representative contigs (including or excluding single-member clusters based on a user-defined flag). As an alternative, a separate utility tool to extract FASTA sequences for: -All representative contigs (with the option to include/exclude single-member clusters). -Each cluster along with its member contigs in separate FASTA files.

This would significantly improve downstream analyses by providing a complete picture of cluster membership and representatives, including single contigs.

All the best! np