pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.42k stars 157 forks source link

precomputed knn #214

Closed jlmelville closed 1 year ago

jlmelville commented 1 year ago

Hello, is there a way to use k-nearest neighbors data created externally? My current strategy is to create a dummy class of the form:

class PrecomputedKNNIndex:
    def __init__(self, indices, distances):
        self.indices = indices
        self.distances = distances
        self.k = indices.shape[1]

    def build(self):
        return self.indices, self.distances

    def query(self, query, k):
        raise NotImplementedError("No query with a pre-computed knn")

    def check_metric(self, metric):
        if callable(metric):
            pass
        return metric

and use it like:

import openTSNE

perplexity = 30
data = get_data_fom_somewhere()

n_neighbors = min(data.shape[0] - 1, int(3 * perplexity))
# assume this doesn't return the "self" neighbor as the first item in the knn
indices, dists = get_nn_from_somewhere(data, n_neighbors)
knn = PrecomputedKNNIndex(indices, dists)

affinities = openTSNE.affinity.PerplexityBasedNN(
    perplexity=perplexity,
    knn_index=knn,
)
embedder = openTSNE.TSNE(n_components=2)
embedded = embedder.fit(data, affinities=affinities)

This seems to work perfectly well, just wondered if I am missing a more obvious approach.

pavlin-policar commented 1 year ago

Hey, I think this is currently the only approach that would work. Your dummy class is actually included here, and I think it has already been released (it's been a while since I looked at this).

It's a convoluted solution, I know, but currently the only supported one. I need to return to this and think about how I would allow something like the standard metric="precomputed" without cluttering the API further.

jlmelville commented 1 year ago

Oh yes looks like I missed the in-built precomputed class. Works for me.

Although I am sure you are not looking for API suggestions, maybe you could allow the neighbors parameter on the TSNE constructor to take a tuple containing the indices and distances and then either create the affinities via the perplexity parameter, or use the Uniform version if the perplexity=None?

Anyway, thank you for the help.