scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

Change dotplot labels from EnsemblID to geneID #1412

Open llumdi opened 4 years ago

llumdi commented 4 years ago

I have an AnnData object:

print(adata)

AnnData object with n_obs × n_vars = 77430 × 1988 obs: 'CONDITION', 'input.path', 'experiment', 'Sample type', 'BiOmics Sample Name', 'PatientID', 'SampleID', 'Response', 'Respond', 'Response2', 'Adjuvant', 'CIT', 'CIT2', 'Lesion2', 'Lesion', 'Stage', 'Fresh', 'CD3IHC', 'CD3IHC_RICZ', 'Mutation2', 'Mutation', 'Site', 'Age', 'Gender', 'PBMCs', 'PBMCs2', 'Seq samples', 'Quality', 'n_counts', 'n_genes', 'percent_mito', 'n_cPg', 'n_cPg2', 'batch', 'louvain' var: 'symbol', 'n_cells' uns: 'louvain', 'louvain_colors', 'neighbors', 'pca' obsm: 'X_pca', 'X_umap' varm: 'PCs'

To label the dotplot with gene symbols instead of ensemblID (index column) I use the gene_symbols parameter:

sc.pl.dotplot(adata=adata, var_names = ['ENSG00000104814','ENSG00000043462'], gene_symbols='symbol')

But I get the following error:

Error: Gene symbol 'ENSG00000104814' not found in given gene_symbols column: 'symbol'

TypeError Traceback (most recent call last)

in 4 sc.pl.dotplot(adata, myg, groupby=condition,dot_min=0,dot_max=0.2,vmin=0,vmax=0.2, save=title+'_'+myg_geneID+'.png') 5 if type(myg_geneID_orig) == list: ----> 6 sc.pl.dotplot(adata, myg, groupby=condition,dot_min=0,dot_max=0.2,vmin=0,vmax=0.2, gene_symbols='symbol', save=title+'_multiple_genes'+'.png') /pstore/apps/bioinfo/scseq/modules/software/Scanpy/1.4.1-foss-2018b-Python-3.7.1-2018.12/lib/python3.7/site-packages/scanpy-1.4.1-py3.7.egg/scanpy/plotting/_anndata.py in dotplot(adata, var_names, groupby, use_raw, log, num_categories, expression_cutoff, mean_only_expressed, color_map, dot_max, dot_min, figsize, dendrogram, gene_symbols, var_group_positions, standard_scale, smallest_dot, var_group_labels, var_group_rotation, layer, show, save, **kwds) 1383 var_names = [var_names] 1384 categories, obs_tidy = _prepare_dataframe(adata, var_names, groupby, use_raw, log, num_categories, -> 1385 layer=layer, gene_symbols=gene_symbols) 1386 1387 # for if category defined by groupby (if any) compute for each var_name TypeError: cannot unpack non-iterable NoneType object My understanding is that it should search for 'ENSG00000104814' in the index column and return the corresponding value in 'symbols', but it seems that is directly searching 'ENSG00000104814' in the 'symbols' column. Thanks for helping, find below the version I am using: #### Versions scanpy==1.4.1 anndata==0.6.22.post1 numpy==1.15.4 scipy==1.1.0 pandas==0.25.2 scikit-learn==0.20.1 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1
fidelram commented 4 years ago

My understanding is that it should search for 'ENSG00000104814' in the index column and return the corresponding value in 'symbols', but it seems that is directly searching 'ENSG00000104814' in the 'symbols' column.

If you add the gene_symbol parameter, scanpy will look for the var_names in this column instead of looking into adata.var index. It will not map between index and the gene_symbol

I you want to do the mapping of labels, you can do something like follows:

# set show=False to get the axes dictionary.
ax_dict = sc.pl.dotplot(adata, myg,  groupby=condition)

# get ensembl ids and map them to gene symbol. Although you can directly map `myg`,
# the following method will work in any case, including `sc.pl.rank_genes_groups_dotplot`
# This method also works for `sc.pl.matrixplot`
ticklabels = [adata.var.loc[x]['gene_symbol'] for x in ax_dict['mainplot_ax'].get_xticklabels()]

# replace ensembls ids by gene symbol in plot
_ = ax_dict['mainplot_ax'].set_xticklabels(ticklabels)
llumdi commented 4 years ago

Thanks for your answer and the code suggestion. When I do: ax_dict = sc.pl.dotplot(adata, myg, groupby=condition, show=False) The plot is still shown and ax_dict is:

GridSpec(2, 5, height_ratios=[0, 10.5], width_ratios=[0.7, 0, 0.2, 0.5, 0.25])

How do I have to set show=False to get the axes dictionary?

Thanks

fidelram commented 4 years ago

Please update to the latest scanpy release (1.6)

zhjilin commented 3 years ago

hello, reporting the same issue using '1.7.2'

And, I could not get around with the above suggestion as the object becomes a list: [Text(0, 0.5, 'ENSMUSG00000021565'), Text(0, 1.5, 'ENSMUSG00000097971'), Text(0, 2.5, 'ENSMUSG00000048905')],

Somehow I could not convert the gene id because of this obstacle in the above list 'Text' object is not iterable.

Can someone fix the problem or provide a feasible solution?

Thanks!


Edited by providing my solution, since get_xticklabels returns matplotlib.text.Text
t=g['mainplot_ax'].get_xticklabels() tx=list() for x in t: tx.append(x.get_text()) ticklabels = [adata.var.loc[x]['gene_symbols'] for x in tx] _ = g['mainplot_ax'].set_xticklabels(ticklabels)