scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

Failed violins #3005

Open apodtele opened 5 months ago

apodtele commented 5 months ago

Please make sure these conditions are met

What happened?

Untitled-1

This was supposed to be a violin plot of total_counts. Notice that some cell categories have no data. This is by design: some categories defined but not assigned to any samples. They are assigned and used elsewhere. This totally breaks the violin plots, which work only if all categories have at least some data. I like that empty categories are still but I would like to see non-empty violins.

Minimal code sample

This code can be used to have additional unassigned categories added:

ord = ['B', 'B_mz', 'B_gro', 'B_pls', 'B_mem',
       'Th', 'Th_reg', 'Th_mem', 'Tc', 'Tc_act', 'Tc_mem',
       'NKT', 'NK_0', 'NK_1', 'NK_2',
       'ncMo', 'cMo', 'DC_1', 'DC_2', 'MΦ_1', 'MΦ_2',
       'Ne', 'RBC', 'PLT', 'HSC', 'Whatever', 'Whatnot', 'Unassigned', 'Huh?', 'What?']

adata.obs['cell_type'] = pd.Categorical(values=adata.obs.cell_type, categories=ord, ordered=True)


### Error output

_No response_

### Versions

scanpy==1.10.1 anndata==0.10.7 umap==0.5.5 numpy==1.26.4 scipy==1.13.0 pandas==2.2.2 scikit-learn==1.4.2 statsmodels==0.14.1 igraph==0.10.3 pynndescent==0.5.12
eroell commented 4 months ago

Hey, thanks for the request.

To be able to reproduce and help, it is a big aid for us if you can supply a code sample that we can run: that is, with some dummy data (the datasets scanpy readily supplies are great for that), and the error/unexpected behaviour you get.

I think in your case this would be e.g.

import scanpy as sc
adata = sc.datasets.pbmc68k_reduced()
adata.obs["louvain"] = adata.obs["louvain"].cat.set_categories(new_categories=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"])
sc.pl.violin(adata, keys='n_counts', groupby='louvain')

Yielding

ValueError: The palette dictionary is missing keys: {'11'}

Is that the issue you are facing?

apodtele commented 4 months ago

I do not know why set_categories fails to add the new ones for you. Perhaps you need to added ordered=True. Notice that in my example I use a different method of adding additional categories which works:

ord = ['1','2','3', 'Whatever', 'Whatnot', 'Huh?', 'What?']

adata.obs['cell_type'] = pd.Categorical(values=adata.obs.cell_type, categories=ord, ordered=True)

Then try to plot any violin plot.

eroell commented 4 months ago

To be able to reproduce and help, it is a big aid for us if you can supply a code sample that we can run: that is, with some dummy data (the datasets scanpy readily supplies are great for that), and the error/unexpected behaviour you get.

Can you show such an example, with data? It is not immediately clear to me what specific you are trying to add or construct; I'm not sure whether basically the dataframe gets destroyed by the operation you intend to perform, or whether it is the violin plot failing (if the dataframe is crooked, it would be this to be fixed)