scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

Support obsm key to color UMAP #1500

Open picciama opened 3 years ago

picciama commented 3 years ago

Quite often I need to color UMAPs based on features that are not part of adata.X but adata.obsm for the reason that they are special. E.g. KO data with gRNAs versus endogenes/ target genes, or viral genes versus edogenes.

Example use case:

Clustering must not include these viral genes -> must be excluded from X. I don't want to store so many additional columns in obs and I need to have these features separated in their own matrix for downstream analysis, which is why I want to use obsm.

Can we have sth. like this:

sc.pl.umap(adata, color='viral_genes')  # adata.obsm['viral_genes'] is a pandas.DataFrame ?

It shouldn't be overcomplicated I think, since this only involves an additional check: if the elements in the color arg list are not found in obs.columns nor var.columns, then check the keys in obsm and use the entire dataframe behind this key.

ivirshup commented 3 years ago

This has been worked on here: https://github.com/theislab/anndata/pull/342

The idea is to allow any vector from the anndata object to be used for coloring, but that PR seems a bit stalled at the moment. This would also be useful for providing parameters in other places, like regress_out.