scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 594 forks source link

add restrict_to to sc.pl.umap function? #759

Open zzwch opened 5 years ago

zzwch commented 5 years ago

Hello,

In some cases, I need to visualize clustering results (categorical) on UMAP for each batch.

I know sc.pl.umap(adata[adata.obs['batch] == 'batch1], color = 'louvain') is a solution. However, other cells are missing. I think the other cells colored by grey as background should be a better way.

I notice that sc.pl.umap(adata, color = 'batch', groups = ['batch1'] ) can retain other cells as grey, though sometimes cells were submerged in the bottom layer (I used reoder_categories to bypass this issue). But, color and groups must be correspondence!

Is there any way to fulfill my needs in Scanpy if I missed something. Or, could the authors add a parameters, such as restrict_to in sc.tl.louvain, to implement this function: ① liberate strong associations between color and groups, and ② add support for ordering categorical variable

ivirshup commented 5 years ago

Regarding 1, I think this is a good idea. I've thought this would be a better solution to #709. What do you think the right API would be for this? Ideally, I think this should be a single argument.

As a work-around, you could do something like this:

import scanpy as sc
pbmc = sc.datasets.pbmc68k_reduced()

ax = sc.pl.umap(pbmc, size=100, show=False)
sc.pl.umap(
    pbmc[pbmc.obs["bulk_labels"] == "Dendritic"],
    size=100,
    color="n_genes",
    ax=ax
)

Figure_1

Note that you will have to explicitly pass the size argument, as the size of each point is determined by the total number of points.

zzwch commented 5 years ago

Thank you very much for the alternative method.

I'm obviously not familiar with with python and matplotlib yet. And it is cool to achieve this goal by specifying axes.

Not only this issue, I also found some functions of scanpy did not take batches into account. For example, tl.rank_genes_groups could not identify DEGs between batches in a same louvain cluster, and then plot them using pl.rank_genes_groups.

Let's not complicate things for the moment.

For this issue, I prefer adding groupby argument to subset groups, which may be a neat way.

Thanks again.