iTaxoTools / TaxI2

Calculation and analysis of pairwise sequence distances
GNU General Public License v3.0
0 stars 0 forks source link

Add MDS scatterplot visualization of genetic distance / clusters / species #19

Open mvences opened 3 years ago

mvences commented 3 years ago

This is not a high-priority need for TaxI3 but would be a nice addition.

In general terms, I think we could add some additional graphical representation of the results, and this could be one of them.

The program calculates a distance matrix for the all-against-all comparison, either on "true" genetic distances obtained for aligned sequences, or with the Alfpy alignment-free distance option. Such a distance matrix could rather easily be transformed by a Multi-Dimensional Scaling analysis into two new variables, and these represented as a scatterplot where for example we could give the option to color the dots according to the species column in the tabfile, or even according to the clustering result if a clustering analysis is done in TaxI3. MDS should be rather straightforward to implement with sklearn.

I am currently not assigning this task to anyone to highlight it is not a high priority, but I wanted to write this idea here to make sure it is not forgotten.