Open trivialfis opened 3 years ago
We synced offline about this and are both able to produce the correct result and match the reference implementation by adjusting n_epochs=200
However, setting n_epochs=0
(the default) should be getting a better result than it is, whether that means we need to increase the number of epochs for the default or something else is going on.
Here's an image of the result when n_epochs
is explicitly set to 200 (for random_state=1994
):
And here's an image when it's set to the default:
Actually, for both fit and transform, when the number of epochs is small, CPU result is better than GPU.
Yep, I do agree that I have seen slower convergence of the optimizer in the past with both training and inference. For example, setting n_epochs=25
does tend to converge faster on CPU, but that value will often be set much higher internally by default in both implementations.
At one point I had dived deeply into the cause of slower convergence and traced it to the consistency issues from the data races during the optimization step. However, I would have expected that to go away by setting random_state
to a nonzero value so this very well may be from a different cause.
I think initialization is part of the problem. Following pictures are plots after a single iteration of fitting with the shuttle dataset, from which you can see the CPU impl already has well-separated clusters. CPU: GPU:
We're just performing a spectral embedding on the entire dataset, rather than the "multi-component layout" approach which is being used in UMAP (and was considered experimental at the time). I would still have expected the resulting embedding to have more separation than this, though and I'm wondering if the lanczos solver might not be converging. What dataset is this? Can you try computing the spectral embedding using Scikit-learn's SpectralEmbedding and see if it improves the separation?
For example, here's a spectral embedding of the digits dataset from scikit-learn
I suppose another possibility is that the connectivities graph could be incorrect, however if that were the case, I'd think the UMAP solver would likely not converge at all.
It's the shuttle dataset from UCI. Thanks for suggestions, I will try to pin point the issue.
This is the spectral embedding output from sklearn with shuttle dataset:
Hi,
I'm getting problems with cuml umap.transform too.
I'm a R developer trying to use reticlate/cuML to speedup my analisys.
Sample data: https://drive.google.com/file/d/1oJjmNAS_KAw4tH_DCNG4IN2UHg7tyEV1/view?usp=sharing
data2 = data with last row as 1st. In R: data2<-data[c(93061, 1:93060),]
Results from cuML and umap-learn
res = cuml.UMAP(random_state=1994).fit(data)
res2 = res.transform(data2)
res3 = umap.UMAP(random_state=1994).fit(data)
res4 = res3.transform(data2)
I tried int = "spectral" and init = "random" with no improvements.
I have datasets with many millions of rows and I think that fit a portion of the rows (gpu memory limit) and transform the rest is a better choice, but with this lower accuracy I can't use gpu yet.
This can be fixed? Or this type of error is inherent to gpu parallelism?
Thanks.
ENV: debian 10 RTX 2060 super nvidia driver: 460 cuda 11.2
EDIT1: Fixed/improved with n_epochs=500. Using the default n_epochs as None, transform use n_epochs=30. Using n_epochs=500, transform use n_epochs=166. umap-learn with default n_epochs (200 fit, 30 transform) is ok. 30 epochs for cuml.UMAP.transform is not enough. For my needs, using n_epochs=500 is ok! Thanks.
fit: transfom:
So far I narrowed it down to the initialization step. Will update once I have something new.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
I tried to compare the results between CPU UMAP and GPU UMAP with fashion mnist dataset, it seems the CPU implementation is more accuracy from a visualization point. The comparison is made between branch-0.20 of cuml and 0.5.1 of CPU UMAP, both are run with seed:
Sample code: