tkonopka / umap

Uniform Manifold Approximation and Projection - R package
Other
132 stars 16 forks source link

is there any spark version implementations? #16

Closed HCMY closed 2 years ago

HCMY commented 2 years ago

hey guys, is there any spark implementation of umap?

tkonopka commented 2 years ago

Hi @HCMY. Just to get on the same page: this package provides an implementation of the umap algorithm in R/Rcpp and an implementation that launches the python-based 'umap-learn' (the original umap).

The R/Rcpp implementation in this repo relies on a dataset to be loaded as a matrix in memory. If you can coerce your data from whatever source into a matrix, then all is OK. But if you are asking about processing data as a stream or data that is larger than memory, then that is not supported.

My impression is that some users of umap-learn package have mentioned spark, but I have not used that myself. You can ask there for help (?). Also, keep in mind that their advanced capabilities might not be compatible with the R-python interfacing here, so they might not work through this package. If you have success with this, please share! Cheers.

HCMY commented 2 years ago

Hi @HCMY. Just to get on the same page: this package provides an implementation of the umap algorithm in R/Rcpp and an implementation that launches the python-based 'umap-learn' (the original umap).

The R/Rcpp implementation in this repo relies on a dataset to be loaded as a matrix in memory. If you can coerce your data from whatever source into a matrix, then all is OK. But if you are asking about processing data as a stream or data that is larger than memory, then that is not supported.

My impression is that some users of umap-learn package have mentioned spark, but I have not used that myself. You can ask there for help (?). Also, keep in mind that their advanced capabilities might not be compatible with the R-python interfacing here, so they might not work through this package. If you have success with this, please share! Cheers.

thnaks for your reply, im working on it.