Closed HCMY closed 2 years ago
Hi @HCMY. Just to get on the same page: this package provides an implementation of the umap algorithm in R/Rcpp and an implementation that launches the python-based 'umap-learn' (the original umap).
The R/Rcpp implementation in this repo relies on a dataset to be loaded as a matrix in memory. If you can coerce your data from whatever source into a matrix, then all is OK. But if you are asking about processing data as a stream or data that is larger than memory, then that is not supported.
My impression is that some users of umap-learn package have mentioned spark, but I have not used that myself. You can ask there for help (?). Also, keep in mind that their advanced capabilities might not be compatible with the R-python interfacing here, so they might not work through this package. If you have success with this, please share! Cheers.
Hi @HCMY. Just to get on the same page: this package provides an implementation of the umap algorithm in R/Rcpp and an implementation that launches the python-based 'umap-learn' (the original umap).
The R/Rcpp implementation in this repo relies on a dataset to be loaded as a matrix in memory. If you can coerce your data from whatever source into a matrix, then all is OK. But if you are asking about processing data as a stream or data that is larger than memory, then that is not supported.
My impression is that some users of umap-learn package have mentioned spark, but I have not used that myself. You can ask there for help (?). Also, keep in mind that their advanced capabilities might not be compatible with the R-python interfacing here, so they might not work through this package. If you have success with this, please share! Cheers.
thnaks for your reply, im working on it.
hey guys, is there any spark implementation of umap?