Open grst opened 3 years ago
I think there's definitely room for more plotting libraries in the ecosystem, but have some doubts about whether all needs can be met by one library. I personally use seaborn
/ matplotlib
, bokeh
, datashader
, and altair
for different cases. I also think making a good plotting API is exceedingly difficult, especially if you target both high and low level use cases. I would note that the plotting code in scanpy feels like some of the most maintenance intensive code in the library.
provides helper functions for handling colors, saving figures, etc.
We can do a bit more of this here. But of course, much of it would end up being matplotlib
specific.
encourages a consistent plotting API (e.g. by defining abstract base classes)
I'd be interested in hearing specific thoughts on this. I've personally been thinking it would be nice to lean on seaborn
plotting classes more heavily here, potentially contributing features upstream. Here's one example https://github.com/mwaskom/seaborn/issues/2487 of a feature which could fit the AnnData
data model nicely.
there is quite some duplicated code in the plotting section
We'd definitely like to reduce the amount of duplicated code, which is what drove the addition of sc.get
. This seems to be working out internally, if slowly.
All the scanpy helper functions for plotting (e.g. savefig_or_show, _set_color_for_categorical_obs etc.) are private scanpy functions
I'd like to move towards stabilizing this. I'm not sure how much we'd want to provide plotting library specific code, vs. more generic helpers. Right now the most obvious addition is _set_color_for_categorical_obs
, which I'd also like to make accessible through sc.get
. Adding groupby
support to anndata
would help a lot here too (https://github.com/theislab/anndata/issues/556).
save_fig_or_show
is something that I don't think we should export, and may need a rework (#1508).
Hi @ivirshup,
thanks for your response! I agree that this can quickly get out of bounds, I'd thus suggest to
matplotlib
/seaborn
(as this is what scanpy and afaik most of the ecosystem projects are using)In brief all that is required to implement a plotting API that behaves like scanpy's.
I'd be interested in hearing specific thoughts on this. I've personally been thinking it would be nice to lean on seaborn plotting classes more heavily here, potentially contributing features upstream. Here's one example mwaskom/seaborn#2487 of a feature which could fit the AnnData data model nicely.
I was mostly referring to @fidelram's idea how to make plot styling more "modular" instead of having a vast amount of arguments for a single plotting function (#956). If this idea was to be implemented for all scanpy plotting functions, I thought that maybe an abstract base-class could provide the method signatures to ensure consistency within scanpy and ecosystem packages. Even with the current "keyword approach" it would be great if there was some way to ensure that common keywords are always named consistently.
What would be an example of a plot object you would like to "move" to seaborn? Something like a multi-panel UMAP plot?
I'd like to move towards stabilizing this. I'm not sure how much we'd want to provide plotting library specific code, vs. more generic helpers. Right now the most obvious addition is _set_color_for_categorical_obs, which I'd also like to make accessible through sc.get. Adding groupby support to anndata would help a lot here too (theislab/anndata#556).
that sounds great!
Finally, in terms of "reusable building blocks" I was thinking of, for instance,
the "dot size legend"
Setting up axes for a scatter plot together with the appropriate legend (continuous color bar or categorical legend)
Ping @WeilerP @adamgayoso, since you've both raised this idea today
I was wondering if plotting could be facilitated and made more consistent across the Scanpy ecosystem. I envisage a library ("scanpyplot" or whatever) that
Motivation:
scirpy.pl.clonotype_network
function, I found myself copying over a lot of code from thescanpy.pl.paga
andscanpy.pl.dotplot
functions.scatter
savefig_or_show
,_set_color_for_categorical_obs
etc.) are private scanpy functions. Implementing plotting functions with consistent bahaviour requires either to duplicate a lot of code, or to rely on a potentially unstable API. In fact, e.g.scvelo
has duplicates of most of these scanpy helper functions. Squidpy has similar functions, too.AnnData.obs
(dandelion.pl.stackedbarplot
by @zktuong,sc_toolbox.api.plot.cluster_composition_stacked_barplot
by @Zethson, andscirpy.pl.group_abundance
by myself). Maybe such more general plots based onAnnData
could be part of a central library as well.