libscran / qdtsne

Quick-and-dirty t-SNE in C++
https://libscran.github.io/qdtsne/
MIT License
9 stars 2 forks source link

TSNE from distance matrix #5

Closed aa-samad closed 1 year ago

aa-samad commented 1 year ago

Hi, Thank you for creating this fantastic Repo. Can we calculate TSNE embedding from a pre-computed distance matrix? If yes, please elaborate on how to do it. Thanks in advance.

LTLA commented 1 year ago

Shouldn't be too hard. Iterate through the rows (or columns) of the distance matrix, and for each observation, just find the k closest neighbors (excluding self), where k is defined using the qdtsne::perplexity_to_k function. Then you can just construct a NeighborList object and feed that to Tsne::run.

Note that this will still apply the Barnes-Hut approximations. Technically, if the dataset is small enough to allow creation of a distance matrix, you could do an exact t-SNE, but this is not supported by the qdtsne library.

aa-samad commented 1 year ago

Thanks a lot for the comment. In case anyone else wonders what I did:


int N = 100;
auto Y = qdtsne::initialize_random(N); 
int K = tsne_perplexity * 3;

// ----- construct knn tree from distance matrix  
std::vector<std::vector<std::pair<int, double>>> knn_list(N);    
std::vector<std::pair<int, double>> k_least_idx(K);

for (int i=0; i < N; i++)
{
    k_least_idx.assign(k_least_idx.size(), std::make_pair(0, 100));
    for(int j=0; j < N; j++)
    {
            if (i == j)
            {
                continue;
            }
        for(int k=0; k < K; k++)
        {
        if (similarity_matrix[i*N + j] < k_least_idx[k].second)
        {
            k_least_idx[k].second = similarity_matrix[i*N + j];
            k_least_idx[k].first = j;
        }
        }
    }
    knn_list[i] = k_least_idx;
}

// ------------ solve the low dim embedd from KNN tree
qdtsne::Tsne tsne;
tsne.set_num_threads(16);
tsne.set_perplexity(tsne_perplexity);
tsne.set_max_iter(tsne_num_iter);
tsne.set_eta(tsne_learning_rate);
tsne.set_exaggeration_factor(tsne_early_exaggeration);

auto ref = tsne.run(knn_list, Y.data());