High memory usage when pynndescent is not installed

lmcinnes / umap

Uniform Manifold Approximation and Projection

BSD 3-Clause "New" or "Revised" License

7.33k stars 796 forks source link

Using UMAP on a small dataset (20 newsgroups), ran my machine of memory (56GB of RAM). However, when I installed pynndescent, this issue went away. I had installed UMAP via

pip install umap-learn --pre

and the code to reproduce it is

import pandas as pd
import umap

# Used to get the data
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

dataset = fetch_20newsgroups(subset='all', shuffle=True, random_state=42)

vectorizer = CountVectorizer(min_df=5, stop_words='english')
word_doc_matrix = vectorizer.fit_transform(dataset.data)

embedding = umap.UMAP(n_components=2, metric='hellinger').fit(word_doc_matrix)

lmcinnes / umap

High memory usage when pynndescent is not installed #379