scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.89k stars 594 forks source link

Datashader as plotting backend #2656

Open grst opened 1 year ago

grst commented 1 year ago

What kind of feature would you like to request?

Other?

Please describe your wishes

When dealing with millions of cells, plotting embeddings becomes annoyingly slow. Datashader aggregates data points before plotting, which is much faster than just making a scatterplot in matplotlib.

For instance, making a multi-panel UMAP plot with 2M cells that takes 1min15s with sc.pl.umap takes 7s with datashader+matplotlib.

I know datashader has come up before in different contexts (e.g. https://github.com/scverse/scanpy/issues/1263), but here I mainly suggest it for speed.


FWIW, I made a prototype implementation of sc.pl.embedding with datashader. It's not feature-complete but covers some common use-cases: https://gist.github.com/grst/424e3e24bf244820000c33a823a47ec1

ivirshup commented 1 year ago

See also:

ivirshup commented 1 year ago

How would you suggest doing the API for this? Another kwarg for backend?

The additional dependencies aren't so bad. They are xarray, dask, and pillow. But still, I probably wouldn't be up for data shader as a required dependency.

grst commented 1 year ago

I have just been exploring the holoviz ecosystem a bit and wasn't aware how nice this is! Ideally we could use something like hvPlot and leave it to the user to select a backend.

The problem is that the scanpy plotting functions have way too many parameters. Supporting all of them in different backends sounds daunting if not impossible.

ivirshup commented 1 year ago

I think datashader would only work in scanpy via the matplotlib rendering backend. I think interactive plotting is definitely out of scope for scanpy. There'd likely be even more options that are interactive specific.

Even then, I'm still not 100% sure this should be in scanpy and not separate.

grst commented 1 year ago

It should probably be discussed in the context of whatever the plotting plans are for scanpy 2.0. Maybe worth dedicating a community meeting to that?

Intron7 commented 1 year ago

@grst would you be open to putting this into rsc? There we could even use cudf and GPU plotting for the dataframe.

ivirshup commented 1 year ago

I would like this be to somewhere where it'd also work for CPU. I think we can implement a __dataframe__ interface that passes either GPU or CPU memory to data shader, then let data shader handle the rest.