pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
168 stars 45 forks source link

st.add_cell_labels error: ValueError: Length mismatch #128

Open XYZuo opened 2 years ago

XYZuo commented 2 years ago

Hi, I'm using a loom file saving from a seurat object. My adata is like:

AnnData object with n_obs × n_vars = 56109 × 22965 obs: 'ClusterID', 'ClusterName', 'DF_classification', 'RNA_snn_res_1_5', 'cell_types', 'gender', 'group', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_hsp', 'percent_mt', 'percent_rb', 'seurat_clusters', 'label_color' var: 'Selected', 'vst_mean', 'vst_variable', 'vst_variance', 'vst_variance_expected', 'vst_variance_standardized' uns: 'label_color', 'workdir' obsm: 'harmony_cell_embeddings', 'pca_cell_embeddings', 'umap_cell_embeddings' varm: 'harmony_feature_loadings_projected', 'pca_feature_loadings' layers: 'norm_data', 'scale_data'

I extracted my cell labels by this: adata.obs['cell_types'].to_csv('labels.tsv',sep='\t',header=0)

But when I try to add it to my object by this: st.add_cell_labels(adata, file_name = 'labels.tsv')

It came an error: ValueError: Length mismatch: Expected axis has 56110 elements, new values have 56109 elements

I checked my adata, there seems 56109 cells with no problem:

adata.obs.index Index(['HC_1_AAACCCAAGACAGTCG-1', 'HC_1_AAACCCAAGAGCCTGA-1', 'HC_1_AAACCCAAGGTCGCCT-1', 'HC_1_AAACCCACAGGTATGG-1', 'HC_1_AAACCCAGTCAATGGG-1', 'HC_1_AAACCCAGTGTTACAC-1', 'HC_1_AAACCCATCGTTTACT-1', 'HC_1_AAACCCATCTAACGGT-1', 'HC_1_AAACCCATCTGGTTGA-1', 'HC_1_AAACGAAAGAATTCAG-1', ... 'ITP_5_TTTGTTGAGACTTGTC-1', 'ITP_5_TTTGTTGAGGACAACC-1', 'ITP_5_TTTGTTGCAAACTAGA-1', 'ITP_5_TTTGTTGCACTTCATT-1', 'ITP_5_TTTGTTGGTAGCTTAC-1', 'ITP_5_TTTGTTGGTCATCGGC-1', 'ITP_5_TTTGTTGGTGCATCTA-1', 'ITP_5_TTTGTTGTCGCGCTGA-1', 'ITP_5_TTTGTTGTCTAAGGAA-1', 'ITP_5_TTTGTTGTCTGTAAGC-1'], dtype='object', name='CellID', length=56109)

adata.obs['cell_types'] CellID HC_1_AAACCCAAGACAGTCG-1 EryP HC_1_AAACCCAAGAGCCTGA-1 preB2 HC_1_AAACCCAAGGTCGCCT-1 GMP HC_1_AAACCCACAGGTATGG-1 MPP HC_1_AAACCCAGTCAATGGG-1 preB1 ... ITP_5_TTTGTTGGTCATCGGC-1 MPP ITP_5_TTTGTTGGTGCATCTA-1 GMP ITP_5_TTTGTTGTCGCGCTGA-1 MPP ITP_5_TTTGTTGTCTAAGGAA-1 MDP ITP_5_TTTGTTGTCTGTAAGC-1 MPP Name: cell_types, Length: 56109, dtype: object

Could you please help me? I can't figure it out.

huidongchen commented 2 years ago

Hi,

Thanks for the feedback. Unfortunately I was not able to reproduce the error when playing around with example data. I am happy to take a closer look if you can share with me the file 'labels.tsv'.

But in your case, you can actually skip the step st.add_cell_labels(). This is equivalent to adata.obs['label'] = adata.obs['cell_types'].copy()

Let me know if this works for you.

XYZuo commented 2 years ago

Thank you for your help! I skip the step st.add_cell_labels() and it works. But the image I got was not consistent with the cell types I annotated. image I want the HSC group to be in the starting position. Could I set the root site by myself? I guess the 'init_nodes_pos’ in st.seed_elastic_principal_graph could realize this, but I don't know how to set it.

Hi,

Thanks for the feedback. Unfortunately I was not able to reproduce the error when playing around with example data. I am happy to take a closer look if you can share with me the file 'labels.tsv'.

But in your case, you can actually skip the step st.add_cell_labels(). This is equivalent to adata.obs['label'] = adata.obs['cell_types'].copy()

Let me know if this works for you.

huidongchen commented 2 years ago

Yes, you can. The pseudotime with different nodes will be all computed once the tree structure is learnt. The pseudotime info is stored in adata.obs

So you can simply replace 'S4' with the root node you desire. E.g., in your case, you can replace S4_pseudotime with S5_pseudotime for HSC cells as the root. (I'm not 100% sure about the color here but it seems HSCs all gather around S5 node )

XYZuo commented 2 years ago

Thanks for your advice! It worked. But I find that I can't add my annotaion color if I don't follow STREAM tutorial to add the colors by 'st.add_cell_colors'. My color annotations are stored in adata.obs.label_color, which match the cluster labels in adata.obs.label. How could I use my annotation color when plotting the stream?

Yes, you can. The pseudotime with different nodes will be all computed once the tree structure is learnt. The pseudotime info is stored in adata.obs

So you can simply replace 'S4' with the root node you desire. E.g., in your case, you can replace S4_pseudotime with S5_pseudotime for HSC cells as the root. (I'm not 100% sure about the color here but it seems HSCs all gather around S5 node )

huidongchen commented 2 years ago

For now I guess it has to be done in a hacky way..

You can add your own colors by : adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()

But this is something that will certainly be addressed in our stream v2.

XYZuo commented 2 years ago

Thank you so much. Unfortunately it gave an error message after I run st.plot_stream_sc:

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['#EDE574' '#99CCFF' '#99CCFF' ... '#D071A9' '#CBE86B' '#D9534F']

And I met a similar error with issue 115 (https://github.com/pinellolab/STREAM/issues/115) after running st.plot_stream(adata,root='S5',color=['label'],save_fig=True, fig_format='pdf')

Traceback (most recent call last): File "", line 1, in File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/stream/core.py", line 3131, in plot_stream log_scale=log_scale,factor_zoomin=factor_zoomin) File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/stream/extra.py", line 933, in cal_stream_polygon_string df_stream.loc[df_stream.index[id_cells],'edge'] = [x] File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 723, in setitem if not is_list_like_indexer(new_ix): File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1730, in _setitem_with_indexer

a) avoid getting things via sections and (to minimize dtype changes)

File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1769, in _setitem_with_indexer_split_path key = tuple([key] + [slice(None)] * (len(labels.levels) - 1)) File "/home/cchu5/miniconda3/envs/zxy-stream/lib/python3.7/site-packages/pandas/core/indexing.py", line 1830, in _setitem_with_indexer_2d_value ): ValueError: Must have equal len keys and value when setting with an ndarray

I tried to downgrade to pandas==1.0 or any other versions, but it didn't work.

Sorry to encroach upon your time. I am also looking forward to the release of stream v2.

For now I guess it has to be done in a hacky way..

You can add your own colors by : adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict()

But this is something that will certainly be addressed in our stream v2.

huidongchen commented 2 years ago

I am sorry that you have to go through these tricky steps to use stream.

Unfortunately I am not sure how to address this issue as I have not run into it or been able to reproduce it myself.

If you can share with me a script and a dummy dataset to reproduce the error, I am more than happy to take a closer look.

XYZuo commented 2 years ago

Hi, Thank you for your patience! Strangely, after I added the label_color in seurat object and then transfered it to loom file, the error about color disappeared. But the second error after running st.plot_stream still exists. I put a test loom file here https://github.com/ZxyChopcat/STREAMtest/blob/master/STREAMtest.zip And this is my scripts:

import stream as st st.version import pandas as pd import numpy as np import anndata as ad import matplotlib matplotlib.use('pdf') import matplotlib.pyplot as plt adata = ad.read_loom("/zxy/STREAM/itp.data1.2.STREAM.loom", sparse=True, cleanup=False, X_name='spliced', obs_names='CellID', var_names='Gene', dtype='float32') st.set_workdir(adata,'/data/tmp_data/zxy/STREAM') adata.var_names_make_unique() adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings'] adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings'] adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2] adata.uns['label_color'] = pd.Series(data=adata.obs['label_color'].tolist(),index=adata.obs['label'].tolist()).to_dict() st.plot_visualization_2D(adata,method='umap',n_neighbors=50,color=['label'],use_precomputed=True,save_fig=True, fig_name='visualization_2D.pdf') st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=True) st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05,save_fig=True, fig_name='ElPiGraph_analysis.pdf') st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=False,save_fig=True, fig_name='dimension_reduction.pdf') st.plot_branches(adata,show_text=True,save_fig=True, fig_name='branches.pdf') st.plot_flat_tree(adata,color=['label','branch_id_alias','S5_pseudotime'],dist_scale=0.5,show_graph=True,show_text=True,save_fig=True,fig_name='flat_tree.pdf') st.plot_stream_sc(adata,root='S5',color=['label','GATA1'],dist_scale=0.5,show_graph=True,show_text=False,save_fig=True, fig_format='pdf',fig_size=(14,9)) st.plot_stream(adata,root='S5',color=['label','GATA1'],save_fig=True, fig_format='pdf')

I am sorry that you have to go through these tricky steps to use stream.

Unfortunately I am not sure how to address this issue as I have not run into it or been able to reproduce it myself.

If you can share with me a script and a dummy dataset to reproduce the error, I am more than happy to take a closer look.

huidongchen commented 2 years ago

hmmm, that is very strange.

I just tested your script and I was able to run it without any errors.

I am attaching the notebook I was using here. test_stream.html.zip

XYZuo commented 2 years ago

So it is likely that there is an error in my environment. I created the conda environment by 'create -n stream python=3.7 stream=1.0 jupyter'. And here is my pip list: Package Version


anndata 0.7.3 argcomplete 1.12.3 argon2-cffi 20.1.0 async-generator 1.10 attrs 21.2.0 backcall 0.2.0 bleach 4.0.0 Bottleneck 1.3.2 cached-property 1.5.2 certifi 2021.5.30 cffi 1.14.6 click 8.0.2 cycler 0.10.0 debugpy 1.4.1 decorator 5.1.0 defusedxml 0.7.1 entrypoints 0.3 fonttools 4.25.0 gunicorn 20.1.0 h5py 3.2.1 importlib-metadata 4.8.1 ipykernel 6.2.0 ipython 7.27.0 ipython-genutils 0.2.0 ipywidgets 7.6.4 jedi 0.18.0 Jinja2 3.0.1 joblib 1.0.1 jsonschema 3.2.0 jupyter 1.0.0 jupyter-client 7.0.1 jupyter-console 6.4.0 jupyter-core 4.7.1 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 kiwisolver 1.3.1 llvmlite 0.36.0 loompy 3.0.6 MarkupSafe 2.0.1 matplotlib 3.2.2 matplotlib-inline 0.1.2 mistune 0.8.4 mkl-fft 1.3.0 mkl-random 1.2.2 mkl-service 2.4.0 munkres 1.1.4 natsort 7.1.1 nbclient 0.5.3 nbconvert 6.1.0 nbformat 5.1.3 nest-asyncio 1.5.1 networkx 2.1 notebook 6.4.3 numba 0.53.1 numexpr 2.7.3 numpy 1.17.5 numpy-groupies 0.9.14 olefile 0.46 packaging 21.0 pandas 1.0.5 pandocfilters 1.4.3 parso 0.8.2 patsy 0.5.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.3.1 pip 21.2.2 plotly 5.1.0 prometheus-client 0.11.0 prompt-toolkit 3.0.20 ptyprocess 0.7.0 pycparser 2.20 Pygments 2.10.0 pynndescent 0.5.4 pyparsing 2.4.7 pyrsistent 0.17.3 python-dateutil 2.8.2 python-slugify 5.0.2 pytz 2021.1 pyzmq 22.2.1 qtconsole 5.1.1 QtPy 1.10.0 rpy2 2.9.4 scikit-learn 0.24.2 scipy 1.7.1 seaborn 0.11.2 Send2Trash 1.8.0 setuptools 58.0.4 Shapely 1.7.1 simplegeneric 0.8.1 six 1.15.0 statsmodels 0.12.2 stream 1.0 tenacity 8.0.1 terminado 0.9.4 testpath 0.5.0 text-unidecode 1.3 threadpoolctl 2.2.0 tornado 6.1 traitlets 5.1.0 typing-extensions 3.10.0.2 tzlocal 2.1 umap-learn 0.5.1 Unidecode 1.2.0 wcwidth 0.2.5 webencodings 0.5.1 wheel 0.37.0 widgetsnbextension 3.5.1 zipp 3.5.0

Can you find anything wrong?

hmmm, that is very strange.

I just tested your script and I was able to run it without any errors.

I am attaching the notebook I was using here. test_stream.html.zip