scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 594 forks source link

Function `plot_scatter` is breaking with matplotlib #286

Open dilawar opened 5 years ago

dilawar commented 5 years ago

Update Note sure if it is a problem with our script. However, if colors are not found, can it use default color-map and let us plot after raising a warning?

Log is following. ~Working on a PR.~ This issue is for reference. Could not upload source file since it depends on data file which are huge.

(Py36) pragati@wasabi-simons ~/Work/scanpy_exp $ python planaria.py 
scanpy==1.3.2+4.g7c9fb1a anndata==0.6.11 numpy==1.14.6 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0 
... storing 'clusters' as categorical
computing tSNE
    using data matrix X directly
    using the 'MulticoreTSNE' package by Ulyanov (2017)
    finished (0:02:39.15)
Traceback (most recent call last):
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/colors.py", line 158, in to_rgba
    rgba = _colors_full_map.cache[c, alpha]
KeyError: ('grey80', None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 4210, in scatter
    colors = mcolors.to_rgba_array(c)
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/colors.py", line 259, in to_rgba_array
    result[i] = to_rgba(cc, alpha)
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/colors.py", line 160, in to_rgba
    rgba = _to_rgba_no_colorcycle(c, alpha)
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/colors.py", line 204, in _to_rgba_no_colorcycle
    raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
ValueError: Invalid RGBA argument: 'grey80'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "planaria.py", line 47, in <module>
    sc.pl.tsne(adata, color='clusters', legend_loc='on data', legend_fontsize=5, save='_full')
  File "/home/pragati/Py36/lib/python3.6/site-packages/scanpy/plotting/tools/scatterplots.py", line 47, in tsne
    return plot_scatter(adata, basis='tsne', **kwargs)
  File "/home/pragati/Py36/lib/python3.6/site-packages/scanpy/plotting/tools/scatterplots.py", line 301, in plot_scatter
    **kwargs)
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/__init__.py", line 1785, in inner
    return func(ax, *args, **kwargs)
  File "/home/pragati/Py36/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 4231, in scatter
    .format(c)
ValueError: 'c' argument must either be valid as mpl color(s) or as numbers to be mapped to colors. Here c = ['orange', 'orange', 'orange', 'deeppink', '#6BAED6', 'deeppink', 'grey80', 'grey80', 'firebrick', '#6BAED6', '#9ECAE1', 'grey80', 'grey80', 'grey80', 'grey80', 'grey80', '#CBC9E2', 'grey80', 'grey80', '#2171B5', 'grey80', 'grey80', 'grey80', 'grey80', 'grey80', '#9ECAE1', 'grey80', 'grey80', 'mediumorchid1', 'grey80', '#2171B5', 'grey80', 'orange', 'grey80', '#9ECAE1', 'firebrick', 'grey80', 'grey80', 'grey80', 'grey80', 'firebrick', 'grey80', '#6BAED6', 'grey80', 'grey80', 'grey80', '#6BAED6', 'grey80', 'grey80', 'grey80', '#4292C6', '#2171B5', 'dodgerblue', 'forestgreen', '#9ECAE1', 'grey80', 'grey80', 'grey80', '#6BAED6', 'grey80', 'limegreen', 'grey80', 'grey80', '#9ECAE1', 'grey80', '#4292C6', 'grey80', '#4292C6', 'grey80', 'limegreen', 'grey80', 'grey80', 'grey80', 'grey80', 'violetred', 'grey80', '#4292C6', '#9ECAE1', 'grey80', 'grey80', '#CBC9E2', 'grey80', 'grey80', '#9ECAE1', '#2171B5', 'grey80', 'grey80', 'grey80', 'hotpink', 'grey80' ..... 
fidelram commented 5 years ago

grey80 is not a valid color name at least in matplotlib 3 although is referenced internally. I will do a quick PR to change this. Meanwhile, you can modify the function that causes the problem by adding a palette:

sc.pl.tsne(adata, color='clusters', legend_loc='on data', legend_fontsize=5, save='_full', palette='Set2')
fidelram commented 5 years ago

Could it be that you are using an annData object that you saved using a previous version of scanpy, which may include non-standard colors?

dilawar commented 5 years ago

It's quite possible. I'll double check.

dilawar commented 5 years ago

I am attaching reduced files. I could reproduce the error with this dataset. Color names are colors_dataset.txt file. Note that python script is renamed to .py.txt . There was an error in paga related plotting function as well.

R_pca_seurat.txt R_annotation.txt colors_dataset.txt planaria.py.txt

falexwolf commented 5 years ago

@fidelram It might be that the new plotting backend doesn't support the "additional colors" (here) anymore. These are colors that are standard in R and used for the Planaria example. We should try to integrate them for the sake of easily moving between python and R.

fidelram commented 5 years ago

I will check that once I get some time.

fidelram commented 5 years ago

@dilawar Can you confirm that the changes solved your problem?

dilawar commented 5 years ago

No. There is still some issue with colors. Note that now I am on python3.7 (which is default on ArchLinux).

$ pip install git+https://github.com/theislab/scanpy --upgrade --user
$ python planaria.py 
/home1/dilawars/.local/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
scanpy==1.3.2+19.g94c3dc5 anndata==0.6.10 numpy==1.15.2 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0 python-igraph==0.7.1 
... storing 'clusters' as categorical
computing tSNE
    using data matrix X directly
    using the 'MulticoreTSNE' package by Ulyanov (2017)
    finished (0:01:09.28)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/matplotlib/colors.py", line 166, in to_rgba
    rgba = _colors_full_map.cache[c, alpha]
KeyError: ('mediumpurple3', None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 4288, in scatter
    colors = mcolors.to_rgba_array(c)
  File "/usr/lib/python3.7/site-packages/matplotlib/colors.py", line 267, in to_rgba_array
    result[i] = to_rgba(cc, alpha)
  File "/usr/lib/python3.7/site-packages/matplotlib/colors.py", line 168, in to_rgba
    rgba = _to_rgba_no_colorcycle(c, alpha)
  File "/usr/lib/python3.7/site-packages/matplotlib/colors.py", line 212, in _to_rgba_no_colorcycle
    raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
ValueError: Invalid RGBA argument: 'mediumpurple3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "planaria.py", line 47, in <module>
    sc.pl.tsne(adata, color='clusters', legend_loc='on data', legend_fontsize=5, save='_full')
  File "/home1/dilawars/.local/lib/python3.7/site-packages/scanpy/plotting/tools/scatterplots.py", line 47, in tsne
    return plot_scatter(adata, basis='tsne', **kwargs)
  File "/home1/dilawars/.local/lib/python3.7/site-packages/scanpy/plotting/tools/scatterplots.py", line 301, in plot_scatter
    **kwargs)
  File "/usr/lib/python3.7/site-packages/matplotlib/__init__.py", line 1867, in inner
    return func(ax, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 4293, in scatter
    .format(c.shape, x.size, y.size))
AttributeError: 'list' object has no attribute 'shape'
[dilawars@chamcham scanpy_exp]$ 
fidelram commented 5 years ago

I think the problem is that some of the color lists in adata.uns contain these matplotlib invalid color names. The current code translate those colors before setting adata.uns but not after. I will add a check for that.

davidhbrann commented 5 years ago

I'm also currently having trouble with sc.pl.scatter. The palette keyword doesn't seem to be working? The colors always seem to be magma no matter what I set as the palette.

fidelram commented 5 years ago

Can you type the command you are using? Or better set up a test case. See the example on #293 maybe you can reproduce your problem with that set up.

davidhbrann commented 5 years ago

Hi, sorry for not giving more of a description of the issue I was having.

I tried to recreate a minimal example today using the PBMC_68k dataset and the cmap argument seemed to be working fine when using a gene as the color, but I'm still having problems with categorical variables like louvain clusters or user-defined cluster names.

fig, ax = plt.subplots(2,2,figsize=(12,8))
sc.pl.umap(adata, color='louvain', ax = ax[0,0], show=False)
sc.pl.umap(adata, color='louvain', ax = ax[0,1], cmap="tab10", show=False)
ax[1,0].scatter(adata.obsm['X_umap'][:,0], adata.obsm['X_umap'][:,1],
            c=adata.obs['louvain'], cmap="tab10", s=0.1)
ax[1,1].scatter(adata.obsm['X_umap'][:,0], adata.obsm['X_umap'][:,1],
            c=adata.obs['louvain'], cmap="tab20b", s=0.1)

image

fig, ax = plt.subplots(2,2,figsize=(12,8))
sc.pl.umap(adata, color=["CD74"], ax=ax[0,0], show=False)
sc.pl.umap(adata, color=["CD74"], cmap="viridis", ax=ax[0,1], show=False)
ax[1,0].scatter(adata.obsm['X_umap'][:,0], adata.obsm['X_umap'][:,1],
            c=adata.X[:,adata.var_names=="CD74"].flatten(), cmap="magma", s=0.1)
ax[1,1].scatter(adata.obsm['X_umap'][:,0], adata.obsm['X_umap'][:,1],
            c=adata.X[:,adata.var_names=="CD74"].flatten(), cmap="viridis",
                s=0.1, vmin=-0.6, vmax=3.5)

image

These are the versions I'm using: scanpy==1.3.2 anndata==0.6.11 numpy==1.14.6 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1 My matplotlib version is 3.0.0.

falexwolf commented 5 years ago

Can you try using the palette argument? After 1.3.1, Scanpy's plotting underwent quite some fundamental changes due to @fidelram. The code base improved a lot, there might be a few small issues, though.

I had both cmap and palette as argument as I wanted users to choose a default for both continuous and categorical annotation. So if someone passes a cmap this only affects the continuous annotation, but for categoricals the rcParams default is used. Does this make sense? It might stop making sense when you provide lists to cmap and/or palette; in order to plot two different categoricals with two different palettes (which should be the default behavior at some point).

Happy to discuss, whether we should depricate the palette argument and have the default access via cmap, that can be provided as a list.

davidhbrann commented 5 years ago

I'm also seeing the same error when using sc.ppl.scatter: sc.pl.scatter(adata, color='louvain', basis="umap", palette="tab20")

image

fidelram commented 5 years ago

I can not replicate the problem. Maybe a lot of clusters are needed to replicate it?

image

falexwolf commented 5 years ago

Maybe it's release 1.3.2 - I wouldn't have made that release if I hadn't been asked to, I expected the new plotting backend to still have several bugs. The current master has several fixes. Do you think we should move forward with another release, @fidelram, @ivirshup; or are there still a few striking bugs in the scatter plots that I'm not aware of? It seems like a lot has been fixed in the past week.

pati-ni commented 5 years ago

I was experiencing the same issues and what worked for me in v 1.4.3 is casting the observations as categorical.

adata.obs['sample'] = adata.obs['sample'].astype('category')

In previous versions of scanpy I had a bunch of warnings when I was saving as an h5ad with non-categorical data and now as I can see problems with the plotting.