Closed fsvbach closed 3 years ago
Actually -- it might already work with metric='precomputed', neighbors='exact'
using sklearn implementation. Have you tried?
I get the ValueError: Sklearn
does not support the precomputed
metric. Please choose one of the supported metrics: braycurtis, canberra, chebyshev, cityblock, dice, euclidean etc.
if I run
from openTSNE.sklearn import TSNE tsne=TSNE(metric='precomputed', neighbors='exact') tsne.fit(M)
M being a distance matrix.
Thanks for bringing this up. This feature has been requested multiple times, and I do think it would be nice to have. So, I'll try to implement this soon and get it in the next release.
I've been thinking about how to structure the API for this. To implement this, we'll have make another affinity class which we'll be able to pass to in the precomputed distances. This is easy. However, I do not like the API we currently have, where we pass in affinity objects to the TSNE constructor. So, what we currently do is
aff = Affinity(...)
embedding = TSNE(affinity=aff, ...).fit(x)
The .fit(x)
bothers me. Currently, we need it because we use it to calculate initialization. But even if we specify the initalization in the constructor, we still need to pass something there. E.g. for a graph with no features, we'd need to do something like
aff = Affinity(...)
init = np.random.random((N, 2))
embedding = TSNE(affinity=aff, init=init, ...).fit(np.ones(N))
I do not like this at all, and would like to change it to something like
embedding = TSNE().fit(affinity=aff, init=init)
This would be much cleaner IMO. So .fit(x)
would change to .fit(x=None, affinity=None, init=None)
.
Then, the standard t-SNE calls don't change at all. If we want a custom affinity model using PCA init, we could call
embedding = TSNE(init="pca").fit(x=x, affinity=aff)
And for any arbitrary graphs with no features, we could either 1) specify the .fit(init=...)
parameter, or 2) calculate the spectral initalization e.g.
embedding = TSNE(init="pca").fit(affinity=aff)
# raises warning "pca needs `x` to be specified, falling back to spectral init
@dkobak Do you have any thoughts on this?
ValueError: Sklearn does not support the precomputed metric.
But that's weird -- sklearn does support precomputed
metric: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors
To implement this, we'll have make another affinity class which we'll be able to pass to in the precomputed distances.
Can't we keep the affinity classes as they are and let Sklearn find exact NNs using metric='precomputed'
if affinity object is called with metric='precomputed'
?
However, I do not like the API we currently have, where we pass in affinity objects to the TSNE constructor.
I think this is unrelated, isn't it? :-) But I fully agree with everything you wrote there, I think your suggested API is better.
Oh I see that precomputed
is not listed in the available sklearn metrics here: https://github.com/pavlin-policar/openTSNE/blob/master/openTSNE/nearest_neighbors.py#L96 -- hence the ValueError. What happens if we add it there?
Can't we keep the affinity classes as they are and let Sklearn find exact NNs using metric='precomputed' if affinity object is called with metric='precomputed'?
Hmm, theoretically, we could do that, but that would make this usage really convoluted. And then the PCA initialization might be calcualted by doing PCA on the distance matrix, which probably doesn't make much sense. I would prefer that there is one, clear way to do this.
I think this is unrelated, isn't it?
Yeah, kind of. But this is the way I propose we implement this, so maybe it's not that unrelated :)
Hmm, theoretically, we could do that, but that would make this usage really convoluted. And then the PCA initialization might be calcualted by doing PCA on the distance matrix, which probably doesn't make much sense. I would prefer that there is one, clear way to do this.
Not sure I understand. What is convoluted about this?
# D is the NxN distance matrix
embedding = TSNE(metric='precomputed', init='spectal').fit(D)
PCA initialization definitely does not make sense here, so when metric='precomputed'
, there should be a warning if init='pca'
and it should fall back to random or spectal.
Update: Actually PCA initialization could make sense because one can compute what is called "PCoA" -- eigenvectors after double-centering the distance matrix; but it's also fine to fall back to random or spectral, imho.
Not sure I understand. What is convoluted about this?
Nothing. That's exactly the interface I implemented in #173. But in the code, it's routed to a special PrecomputedDistanceMatrix
class, so that we can properly error if somebody later tries to call .transform
. I don't know what would happen in that case if we went through the existing Sklearn
neighbors class.
Anyhow, I'll merge #173 when the tests pass (unless either of you have anything against it). Then I would ask @fsvbach to pull the latest master and test if it works as you expected.
Yes, thank you very much @pavlin-policar . When I install the latest master it accepts precomputed matrices!
Hi developers,
firstly, thanks a lot for this nice package! I have a request for another feature:
It seems to me, that openTSNE doesn't support a precomputed distance matrix as given input. However, this is available on sklearn, so I wondered if it was planned to offer the feature with openTSNE as well?
Thanks a lot, Fynn