Closed cjnolet closed 6 months ago
I recall looking at some ScanPy workflows and @davidsebfischer pointed out that many of them rely on Sparse data types. Is that still the case ?
I recall looking at some ScanPy workflows and @davidsebfischer pointed out that many of them rely on Sparse data types. Is that still the case ?
yes, this still is the case
@quasiben @davidsebfischer. I've been working on using RAPIDS/CuPy to implement a Seurat / Scanpy single-cell RNA workflow. Specifically, I've been finding it quite challenging do w/ CuPy sparse arrays because of the following two issues:
https://github.com/cupy/cupy/issues/2360 https://github.com/cupy/cupy/issues/3178
Currently, I'm having to convert to scipy.sparse
to implement filtering.
Do you know how hard it would be to add cuSparse to CuPy for more sparse support ?
@quasiben As far as I know Cusparse is being used under Cupy currently for a lot of the operations.
I’m not quite sure why those slicing strategies aren’t supported yet. I just figured maybe they were less trivial than the others and weren’t immediately needed so they were pushed off to future feature requests.
The issue #2360 I can’t imagine is too hard- I imagine the output array the size of the selection list could be allocated and a Cuda kernel scheduled to write the selected entries in parallel.
I’m not as sure about the other issue, but what Dask is trying to do seems more like an API compatibility issue than one of performance/compute.
What features in cuSPARSE would be useful for slicing?
@jakirkham, I’m not sure about slicing. @quasiben ’s question seems to imply the use of cusparse in cupy for general sparse operations. Please correct me if I misunderstood.
I believe once the two issues above are resolved, much of the scipy.sparse functionality for the preprocessing in Scanpy should be able to be swapped with cupy.sparse.
The ML stuff is a little but different, and I’ve created a separate issue to track that discussion.
I think our initially identified bottleneck with using sparse arrays was this here https://github.com/cupy/cupy/issues/2359.
The analysis workflows usually have very clear computational bottlenecks, so the translation to GPU should take this into consideration: Is it feasible in terms of available code to keep the array on GPU and actually perform all operations there or will this stay a CPU centric library that deploys particular steps to GPU. Inbatchglm / diffxpy we took the first approach, we build ontop of (a CPU centric scanpy and) deployed GLM fitting to GPU via tensorflow2, we also use estimation code in dask in the same package that we could in principle use with cupy, right now this just sits ontop of numpy.
Happy to be involved with this stuff, I spent some time thinking about this with @quasiben already. I think it is really crucial to figure out where it makes sense to invest time to build pipelines that can be end-to-end be executed on GPU: because of the large number of tools this will not be the entire scanpy tool environment for a long time, so mixed workflows will be necessary.
sc.tl
for now because this contains most potential bottlenecks I think that are frequently used. "end-to-end" doesnt need to go all the way up to analysis graph leaves, such as plotting, in my opinion, as their is little performance gain there.https://github.com/scverse/rapids_singlecell is the solution! 🚀
It would be very useful for the GPU data science and research community if Scanpy were able to perform end to end workflows on the GPU, using either Cupy, CuDF or both.
An initial iteration of this feature could include simply swapping out the numpy imports for cupy.
sc.tools
?sc.pl
?sc.external.*
?...