jeetsukumaran / DendroPy

A Python library for phylogenetic scripting, simulation, data processing and manipulation.
https://pypi.org/project/DendroPy/.
BSD 3-Clause "New" or "Revised" License
207 stars 63 forks source link

PhylogeneticDistanceMatrix.distances: return symmetric distances #127

Closed nick-youngblut closed 4 months ago

nick-youngblut commented 3 years ago

It would be helpful if the user could select PhylogeneticDistanceMatrix.distances(full=True) in order to get back a vector or matrix of symmetric distances instead of just the lower triangle (lacking the diagonal)

jeetsukumaran commented 3 years ago

Right now, this method returns a list of distances.

If implemented, you would want it to return the concatenation of this list and [0] * n, where n = number of taxa?

nick-youngblut commented 3 years ago

I guess that the user can just make the symmetric matrix via:

taxa = t.taxon_namespace
np.array([pdc(t1,t2) for t2 in taxa for t1 in taxa]).reshape(len(taxa), len(taxa))

...but it would be nice to have a simpler method. At least for me, I wanted a symmetric matrix (as shown above) that I could feed to scikit-learn for clustering.

jeetsukumaran commented 3 years ago

Fair enough.

But again, what would be the expected return value of this method with this option (given that DendroPy does not require or use NumPy)?

nick-youngblut commented 3 years ago

hmm... without the numpy requirement, the user would have to convert to an array, such as via:

numpy.array([numpy.array(xi) for xi in x])

...which is nearly as much work as:

taxa = t.taxon_namespace
np.array([pdc(t1,t2) for t2 in taxa for t1 in taxa]).reshape(len(taxa), len(taxa))

...so maybe such a feature would not actually be that helpful