iTaxoTools / TaxI2-legacy

Calculates genetic differences between DNA sequences
GNU General Public License v3.0
0 stars 0 forks source link

Additional clustering output #32

Closed mvences closed 3 years ago

mvences commented 3 years ago

If the original input file is a tab-delimited sequence file with species and genus information, then the following should be added to the clustering output. If only a fasta is given as input, then do not provide any of this output.

  1. Check which species (if any) are represented in more than one cluster and print a list as in the following model: "Comparison of cluster assignment with species assignment in the input file: Sequences of Mantella aurantiaca are included in 3 clusters: 1, 6, 7 Sequences of Mantella crocea are included in 2 clusters: 1, 2 Sequences of all other species are included in only one cluster, respectively"

If no species is included in more than 1 cluster, then simply print: "Comparison of cluster assignment with species assignment in the input file: Sequences of all species are included in only one cluster, respectively."

  1. Check which clusters, if any, contains representative of more than one species, and print an output as in the following: "List of clusters containing sequences of more than one species (according to species assignment in the input file): Cluster 1 contains sequences of 2 species: Mantella aurantiaca, Mantella crocea Cluster 5 contains sequences of 2 species: Mantella auranticana, Mantella cowani All other clusters contain sequences of only a single species, respectively."

If no cluster contains sequences of more than 1 species, then simply print: " List of clusters containing sequences of more than one species (according to species assignment in the input file): All clusters contain sequences of only one species, respectively."