rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.17k stars 528 forks source link

[BUG] TSNE: make knn_graph argument supersede X argument #4490

Open RichieHakim opened 2 years ago

RichieHakim commented 2 years ago

Describe the solution you'd like Currently, if a user wants to use a custom distance metric for TSNE or UMAP, they need to both input the original features (X), as well as the distance matrix (knn_graph). It should not be necessary to include X if knn_graph is given.

Users (myself included) use a custom distance matrix that no longer relates to a feature (X) matrix. Mine is calculated as the product of two different distance matrices, making it impossible for me to use cuML's TSNE with this custom distance matrix.

IDEAL SOLUTION: Follow the convention used in sklearn. If metric='precomputed', then X is assumed to be a distance matrix.

Describe alternatives you've considered sklearn's TSNE works nicely, it's just ~400x slower.

Additional context sklearn's TSNE implementation with a custom distance matrix:

tsne_cpu = sklearn.manifold.TSNE(
    n_components=2,
    perplexity=30.0,
    metric='precomputed',
    init='random',
    method='barnes_hut',
    n_jobs=-1,
)
embeddings = tsne_cpu.fit_transform(
    X=distances_conjunctive,
)
github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

RichieHakim commented 2 years ago

This issue still needs to be addressed.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

RichieHakim commented 2 years ago

This issue still needs to be addressed.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

RichieHakim commented 2 years ago

This issue still needs to be addressed.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

RichieHakim commented 2 years ago

This issue still needs to be addressed.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

RichieHakim commented 2 years ago

This issue still needs to be addressed.

RichieHakim commented 2 years ago

THIS ISSUE STILL NEEDS TO BE ADDRESSED!