pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.44k stars 158 forks source link

transform() or prepare_partial() with precomputed distances? #202

Closed juleskuehn closed 2 years ago

juleskuehn commented 2 years ago
Expected behaviour

New points can be embedded into an existing embedding by passing their precomputed distance to each point in the original embedding.

Actual behaviour

RuntimeError: Precomputed distance matrices cannot be queried

Steps to reproduce the behavior
import numpy as np
from openTSNE import TSNE
from openTSNE import TSNEEmbedding
from openTSNE import affinity
from openTSNE import initialization

distance_matrix = np.array([
    [1.0, 0.5],
    [0.5, 1.0]
])

affinities_train = affinity.PerplexityBasedNN(
    distance_matrix,
    perplexity=30,
    metric="precomputed",
    n_jobs=8,
    random_state=42,
    verbose=True,
)

init_train = initialization.pca(distance_matrix, random_state=42)

embedding_train = TSNEEmbedding(
    init_train,
    affinities_train,
    negative_gradient_method="fft",
    n_jobs=8,
    verbose=True,
)

embedding_train_1 = embedding_train.optimize(n_iter=250, exaggeration=12, momentum=0.5)

# Add a new point
embedding_train_1.transform(np.array([[0.5, 0.5]]))
pavlin-policar commented 2 years ago

This is a great point, and easy to implement. This should be implemented in #208.

There is an easier way to use precomputed distances, as can be seen here. Admittedly, the documentation here isn't the best.