Open huidongchen opened 3 years ago
The most likely candidate is a great deal of duplicate records; or records that essentially look identical up to floating point precision.
Thanks so much for your quick reply! That’s good to know. If that’s the case, is there a way to get around this problem?
The only thing I can think of is to deduplicate the dataset by whatever means you have. At the very least checking for duplicates would be a good start. If you have unlimited compute and enough memory you could also just compute the full distance matrix of the data and pass that in with metric="precomputed"
and that should at least run.
Hi,
Since UMAP v5, I have been getting the following warning quite often when dealing with large dataset (>10k points, ~100 features) and it will end up getting stuck.. (I am running it on sever so i have almost unlimited computing resources)
./myenv/lib/python3.7/site-packages/pynndescent/rp_trees.py:1005: UserWarning: Random Projection forest initialisation failed due to recursionlimit being reached. Something is a little strange with your graph_data, and this may take longer than normal to compute. "Random Projection forest initialisation failed due to recursion"
Any idea how to solve this issue? I would really appreciate your help. Thanks!