Open jolespin opened 5 years ago
Ah, that's the multi-component spectral initialisation failing, because it doesn't support pre-computed metrics. I'm on vacation at the moment, but I can make a better error message when I get back.
It has been a while but this seems to be the cause: https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering
This is SpectralClustering but the same goes for SpectralEmbedding which is used by UMAP. They both expect an affinity/similarity matrix and not a distance matrix.
This could probably be solved by using the solution provided in the link:
similarity = np.exp(-beta * distance / distance.std())
And then passing similarity
to SpectralEmbedding
within UMAP.
I also came across this problem. I calculated 3 distance matrices with 3 different (custom) metrics. Only one failed. I am not sure wether this makes the other two results wrong, but looking at the solution of sleighsoft they probably are? Yet, the results do not look that wrong. Which is kind of a dangerous thing, then. As a temporary solution I now use init='random'
, which seems to work.
Hi, sorry, Is this issue being looked into? Otherwise maybe you could suggest methods to recreate original datapoints if you only have a distance matrix? The N(dim) is unknown in my case, but I assume it is possible to find a perfect embedding when selecting N(dim)=N(samples).
Hello,
Is this issue at all being looked into? With the new HDBscan algorithm being implemented into scikit-learn and its impending medoid/centroid features, I would hope somebody would help solve this issue.
I just wanted to bring to your attention this error message. I believe this error is a little misleading because the algorithm works for n_neighbors=15 but not n_neighbors=3. Do you know what it could be in the backend that is preventing it from working for n_neighbors=3 and throwing the shape message?
Error