ing-bank / sparse_dot_topn

Python package to accelerate the sparse matrix multiplication and top-n similarity selection
Apache License 2.0
399 stars 86 forks source link

Possibility for GPU support? #37

Open rohanrajpal opened 4 years ago

rohanrajpal commented 4 years ago

Thanks a lot for sharing this library.

I was wondering if we can have GPU support? Not sure how tough it would be, but I'll be glad to help! If someone knows some resources on how to go ahead on this, please do share.

ymwdalex commented 4 years ago

@rohanrajpal thanks for the message.

Years ago when we build this package, we investigated the GPU solution. We didn't find a cuda sparse matrix multiple sparse matrix solution at that time.

Maybe things have changed. Any suggestions/ideas are welcome!

aerdem4 commented 4 years ago

@rohanrajpal RAPIDS cuML has NearestNeighbors implementation. It currently supports euclidean distance for dense matrices on GPU but they are working on cosine similarity on sparse matrices for future releases (probably 0.16 or 0.17). https://docs.rapids.ai/api/cuml/stable/api.html?highlight=neighbors#cuml.neighbors.NearestNeighbors

As a temporary solution, you can use cupy sparse matrices and dot product if you don't have any memory limitation.

Considering availability of these tools, it may be out of scope for this package.

rohanrajpal commented 4 years ago

@rohanrajpal RAPIDS cuML has NearestNeighbors implementation. It currently supports euclidean distance for dense matrices on GPU but they are working on cosine similarity on sparse matrices for future releases (probably 0.16 or 0.17). https://docs.rapids.ai/api/cuml/stable/api.html?highlight=neighbors#cuml.neighbors.NearestNeighbors

As a temporary solution, you can use cupy sparse matrices and dot product if you don't have any memory limitation.

Considering availability of these tools, it may be out of scope for this package.

Thanks for sharing. I'll have a look into it.

sarimak commented 3 years ago

I would be very skeptical about the benefit from using GPU for sparse matrix multiplication. Have look at what various sources on the Internet say about bad cache locality and bad suitability for long vector operations which are crucial for the speedups provided by GPU acceleration. Plus the overhead when copying the memory from RAM to GPU RAM and the results back. Especially with large matrices which don't fit into GPU RAM at once, I would not expect any speedup. Dense matrices, especially if they fit into GPU RAM are completely different story though...