pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
173 stars 48 forks source link

Seurat object #100

Open wajm opened 3 years ago

wajm commented 3 years ago

Hi, I have precomputed seurat objects. Can I use these objects? or any example?

huidongchen commented 3 years ago

Hi thanks for trying STREAM. To use Seurat object,

1)save seurat object as a loom file in R.

your_object.loom <- as.loom(your_object, filename = "./your_object.loom", verbose = FALSE)
pbmc.loom$close_all()

2)set necessary variables in STREAM :

import anndata as ad
import stream as st

adata = ad.read_loom('"./your_object.loom")

st.set_workdir(adata,'./stream_result')
st.add_cell_labels(adata)
st.add_cell_colors(adata)
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings']

then you can either learn graphical structure directly on UMAP from seurat:

st.plot_visualization_2D(adata,method='umap',n_neighbors=50,color=['label'])
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=True)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05)
st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=False)
st.plot_branches(adata,show_text=False)

or start from the step of dimension reduction as in STREAM tutorial

st.dimension_reduction(adata,method='se',feature='top_pcs',n_components=2,n_neighbors=15,n_jobs=4)
st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=False,show_text=False)
crazyhottommy commented 3 years ago

this worked for me after changing adata.obsm['X_vis_umap'] to adata.obsm['X_dr']. Thanks! also the docker version lacks loompy installation.

huidongchen commented 3 years ago

Thanks very much for the feedbacks, Ming! I will certainly keep them in mind.

Sorry about the issue related to adata.obsm['X_vis_umap']. I made a mistake in the above code snippet.

Changing adata.obsm['X_vis_umap'] to adata.obsm['X_dr'] can be a potential solution. But if you want to infer trajectories directly on UMAP from seurat, you might need to do the following:

import anndata as ad
import stream as st

adata = ad.read_loom('"./your_object.loom")

###set necessary variables in STREAM :
st.set_workdir(adata,'./stream_result')
st.add_cell_labels(adata)
st.add_cell_colors(adata)
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]

###learn trajectories directly on UMAP from seurat
st.plot_visualization_2D(adata,method='umap',n_neighbors=50,color=['label'],use_precomputed=True)
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=True)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05)
st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=False)
st.plot_branches(adata,show_text=False)
crazyhottommy commented 3 years ago

Thanks @huidongchen.

adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]

X_dr is used for trajectory inference, X_vis_umap is for visualization right? I guessed from the names of the variables.

huidongchen commented 3 years ago

X_dr indicates for the space after dimension reduction (the number of dimensions can be greater than 2) and X_vis indicates the 2D plane for UMAP/tSNE visualization.

But trajectory inference can be done in either space.

When you enabled use_vis=True in st.seed_elastic_principal_graph(), it will learn the trajectories using X_vis (in your case, it's the UMAP from Seurat. adata.obsm['X_dr'] will be ignored but to avoid the error from the step st.plot_visualization_2D(), you will need to specify it )

When use_vis=False(by default) in st.seed_elastic_principal_graph(), it will learn the trajectories using X_dr (adata.obsm['X_dr'] is obtained from the step st.dimension_reduction())

I hope the above is useful to you.

wajm commented 3 years ago

thank you for kind information. btw, my seurat integration object make some problem. after making loom format,

adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings'] adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings'] adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]

these are not working.

they cannot find both 'pca_cell_embeddings' and 'umap_cell_embeddings'. What are they?

huidongchen commented 3 years ago

I just took a look. if you save the object coembed in R as follows:

coembed.loom <- as.loom(coembed, filename = "./coembed.loom", verbose = FALSE)
coembed.loom$close_all()

Both pca_cell_embeddings and umap_cell_embeddings should still be there and the same procedure above should still work.

wajm commented 3 years ago

hi AnnData object with n_obs × n_vars = 51813 × 30644 obs: 'label', 'label_color' var: 'Selected', 'vst_mean', 'vst_variable', 'vst_variance', 'vst_variance_expected', 'vst_variance_standardized' uns: 'workdir', 'label_color' layers: 'norm_data' but, where are they?

huidongchen commented 3 years ago

I assume the loom file you saved does not contain all the information needed.

Did you follow through the Seurat integration tutorial? If you did, when you save coembed to loom file, it should contain the fields pca_cell_embeddings and umap_cell_embeddings.

I tested it myself. It works for me.

wajm commented 3 years ago

Sorry, I did reciprocal PCA integration. https://satijalab.org/seurat/v3.2/integration.html

wajm commented 3 years ago

okay, I solve the problem. my integration object was changed. DefaultAssay(patient) was RNA. So I return DefaultAssay to "integrated". So sorry and Thank you very much.

huidongchen commented 3 years ago

okay, I solve the problem. my integration object was changed. DefaultAssay(patient) was RNA. So I return DefaultAssay to "integrated". So sorry and Thank you very much.

No worries at all! Thanks for letting us know. I'm glad you figured it out.

crazyhottommy commented 3 years ago

Hi Huidong, I went on to do marker detection at each leaf:

st.detect_leaf_markers(adata,cutoff_zscore=1.0,cutoff_pvalue=0.01,
                       root='S3',n_jobs=4)

I got this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
   1670                 blocks = [
-> 1671                     make_block(values=blocks[0], placement=slice(0, len(axes[0])))
   1672                 ]

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
   2743 
-> 2744     return klass(values, ndim=ndim, placement=placement)
   2745 

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
   2399 
-> 2400         super().__init__(values, ndim=ndim, placement=placement)
   2401 

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
    130             raise ValueError(
--> 131                 f"Wrong number of items passed {len(self.values)}, "
    132                 f"placement implies {len(self.mgr_locs)}"

ValueError: Wrong number of items passed 1, placement implies 33538

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-177-48ce301bf716> in <module>
      1 st.detect_leaf_markers(adata,cutoff_zscore=1.0,cutoff_pvalue=0.01,
----> 2                        root='S3',n_jobs=4)

~/anaconda3/envs/stream/lib/python3.6/site-packages/stream/core.py in detect_leaf_markers(adata, marker_list, cutoff_zscore, cutoff_pvalue, percentile_expr, n_jobs, min_num_cells, use_precomputed, root, preference)
   3796         df_sc = pd.DataFrame(index= adata.obs_names.tolist(),
   3797                              data = adata[:,input_markers].X,
-> 3798                              columns=input_markers)
   3799         #exclude markers that are expressed in fewer than min_num_cells cells
   3800         print("Filtering out markers that are expressed in less than " + str(min_num_cells) + " cells ...")

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    521                     mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
    522                 else:
--> 523                     mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    524             else:
    525                 mgr = init_dict({}, index, columns, dtype=dtype)

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    232         block_values = [values]
    233 
--> 234     return create_block_manager_from_blocks(block_values, [columns, index])
    235 
    236 

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
   1679         blocks = [getattr(b, "values", b) for b in blocks]
   1680         tot_items = sum(b.shape[0] for b in blocks)
-> 1681         raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1682 
   1683 

ValueError: Shape of passed values is (2000, 1), indices imply (2000, 33538)

It seems it can not find the data matrix. How can I add that into adata?

wajm commented 3 years ago

one more thing, in case of Importing precomputed seurat object, is it possible to make 3d plot? at the moment I can't.

huidongchen commented 3 years ago

one more thing, in case of Importing precomputed seurat object, is it possible to make 3d plot? at the moment I can't.

To use the 3d umap result from Seurat, you simply need to set use_vis=False:

import anndata as ad
import stream as st

adata = ad.read_loom('"./your_object.loom")

###set necessary variables in STREAM :
st.set_workdir(adata,'./stream_result')
st.add_cell_labels(adata)
st.add_cell_colors(adata)
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]

###learn trajectories directly on 3D UMAP from seurat
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=False)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05)
st.plot_dimension_reduction(adata,color=['label'],n_components=3,show_graph=True,show_text=False,plotly=False)
st.plot_branches(adata,show_text=False)
huidongchen commented 3 years ago

Hi Huidong, I went on to do marker detection at each leaf:

st.detect_leaf_markers(adata,cutoff_zscore=1.0,cutoff_pvalue=0.01,
                       root='S3',n_jobs=4)

I got this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
   1670                 blocks = [
-> 1671                     make_block(values=blocks[0], placement=slice(0, len(axes[0])))
   1672                 ]

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
   2743 
-> 2744     return klass(values, ndim=ndim, placement=placement)
   2745 

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
   2399 
-> 2400         super().__init__(values, ndim=ndim, placement=placement)
   2401 

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
    130             raise ValueError(
--> 131                 f"Wrong number of items passed {len(self.values)}, "
    132                 f"placement implies {len(self.mgr_locs)}"

ValueError: Wrong number of items passed 1, placement implies 33538

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-177-48ce301bf716> in <module>
      1 st.detect_leaf_markers(adata,cutoff_zscore=1.0,cutoff_pvalue=0.01,
----> 2                        root='S3',n_jobs=4)

~/anaconda3/envs/stream/lib/python3.6/site-packages/stream/core.py in detect_leaf_markers(adata, marker_list, cutoff_zscore, cutoff_pvalue, percentile_expr, n_jobs, min_num_cells, use_precomputed, root, preference)
   3796         df_sc = pd.DataFrame(index= adata.obs_names.tolist(),
   3797                              data = adata[:,input_markers].X,
-> 3798                              columns=input_markers)
   3799         #exclude markers that are expressed in fewer than min_num_cells cells
   3800         print("Filtering out markers that are expressed in less than " + str(min_num_cells) + " cells ...")

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    521                     mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
    522                 else:
--> 523                     mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    524             else:
    525                 mgr = init_dict({}, index, columns, dtype=dtype)

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    232         block_values = [values]
    233 
--> 234     return create_block_manager_from_blocks(block_values, [columns, index])
    235 
    236 

~/anaconda3/envs/stream/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
   1679         blocks = [getattr(b, "values", b) for b in blocks]
   1680         tot_items = sum(b.shape[0] for b in blocks)
-> 1681         raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1682 
   1683 

ValueError: Shape of passed values is (2000, 1), indices imply (2000, 33538)

It seems it can not find the data matrix. How can I add that into adata?

Unfortunately, in the current version of STREAM, sparse matrix is not supported yet for the marker detection. You will have to densify it first,

adata.X = adata.X.todense()
st.detect_leaf_markers(adata,cutoff_zscore=1.0,cutoff_pvalue=0.01,
                       root='S3',n_jobs=4)

This step can be slow. To speed it up, alternatively, you can only scan the variable genes or any specified list of genes by specifying marker_list:

adata.X = adata.X.todense()
st.detect_leaf_markers(adata,marker_list=adata.var[adata.var['Selected']>0].index, cutoff_zscore=1.0,cutoff_pvalue=0.01,
                       root='S3',n_jobs=4)

This issue will be solved in STREAM v2.

crazyhottommy commented 3 years ago

@huidongchen Thanks! this worked for me.

wajm commented 3 years ago

one more thing, in case of Importing precomputed seurat object, is it possible to make 3d plot? at the moment I can't.

To use the 3d umap result from Seurat, you simply need to set use_vis=False:

import anndata as ad
import stream as st

adata = ad.read_loom('"./your_object.loom")

###set necessary variables in STREAM :
st.set_workdir(adata,'./stream_result')
st.add_cell_labels(adata)
st.add_cell_colors(adata)
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]

###learn trajectories directly on 3D UMAP from seurat
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=False)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05)
st.plot_dimension_reduction(adata,color=['label'],n_components=3,show_graph=True,show_text=False,plotly=False)
st.plot_branches(adata,show_text=False)

ummm, This is not working. something wrong?

huidongchen commented 3 years ago

one more thing, in case of Importing precomputed seurat object, is it possible to make 3d plot? at the moment I can't.

To use the 3d umap result from Seurat, you simply need to set use_vis=False:

import anndata as ad
import stream as st

adata = ad.read_loom('"./your_object.loom")

###set necessary variables in STREAM :
st.set_workdir(adata,'./stream_result')
st.add_cell_labels(adata)
st.add_cell_colors(adata)
adata.obsm['top_pcs'] = adata.obsm['pca_cell_embeddings']
adata.obsm['X_dr'] = adata.obsm['umap_cell_embeddings']
adata.obsm['X_vis_umap'] = adata.obsm['umap_cell_embeddings'][:,:2]

###learn trajectories directly on 3D UMAP from seurat
st.seed_elastic_principal_graph(adata,n_clusters=10,use_vis=False)
st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.05,epg_lambda=0.05)
st.plot_dimension_reduction(adata,color=['label'],n_components=3,show_graph=True,show_text=False,plotly=False)
st.plot_branches(adata,show_text=False)

ummm, This is not working. something wrong?

What error did you see?

wajm commented 3 years ago

same result with before 2D plot.

huidongchen commented 3 years ago

same result with before 2D plot.

Sorry but I can't really help much with the current info. I don't know what you mean by 'same result'.

wajm commented 3 years ago

same result with before 2D plot.

Sorry but I can't really help much with the current info. I don't know what you mean by 'same result'.

before use "use_vis=False" , I got a nice 2d plot following these. st.seed_elastic_principal_graph(adata,n_clusters=100,use_vis=True) st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.2,epg_lambda=0.01) st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=True,fig_size=(10,8))

but after use "use_vis=False" , I got same plot, not 3D.

huidongchen commented 3 years ago

same result with before 2D plot.

Sorry but I can't really help much with the current info. I don't know what you mean by 'same result'.

before use "use_vis=False" , I got a nice 2d plot following these. st.seed_elastic_principal_graph(adata,n_clusters=100,use_vis=True) st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.2,epg_lambda=0.01) st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=True,fig_size=(10,8))

but after use "use_vis=False" , I got same plot, not 3D.

You need to set n_component=3. If you copy the code I shared previously, it should work. I have tested it.

wajm commented 3 years ago

same result with before 2D plot.

Sorry but I can't really help much with the current info. I don't know what you mean by 'same result'.

before use "use_vis=False" , I got a nice 2d plot following these. st.seed_elastic_principal_graph(adata,n_clusters=100,use_vis=True) st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.2,epg_lambda=0.01) st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=True,fig_size=(10,8)) but after use "use_vis=False" , I got same plot, not 3D.

You need to set n_component=3. If you copy the code I shared previously, it should work. I have tested it.

yes, I already try that, but message like that. n_components is greater than the available dimension. It is corrected to 2

huidongchen commented 3 years ago

same result with before 2D plot.

Sorry but I can't really help much with the current info. I don't know what you mean by 'same result'.

before use "use_vis=False" , I got a nice 2d plot following these. st.seed_elastic_principal_graph(adata,n_clusters=100,use_vis=True) st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.2,epg_lambda=0.01) st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=True,fig_size=(10,8)) but after use "use_vis=False" , I got same plot, not 3D.

You need to set n_component=3. If you copy the code I shared previously, it should work. I have tested it.

yes, I already try that, but message like that. n_components is greater than the available dimension. It is corrected to 2

Were you using three dimensions for UMAP in Seurat? You can check the umap coordinates you saved from Seurat (stored in adata.obsm['umap_cell_embeddings']). It seems you did not have 3D UMAP in Seurat.

wajm commented 3 years ago

same result with before 2D plot.

Sorry but I can't really help much with the current info. I don't know what you mean by 'same result'.

before use "use_vis=False" , I got a nice 2d plot following these. st.seed_elastic_principal_graph(adata,n_clusters=100,use_vis=True) st.elastic_principal_graph(adata,epg_alpha=0.01,epg_mu=0.2,epg_lambda=0.01) st.plot_dimension_reduction(adata,color=['label'],n_components=2,show_graph=True,show_text=True,fig_size=(10,8)) but after use "use_vis=False" , I got same plot, not 3D.

You need to set n_component=3. If you copy the code I shared previously, it should work. I have tested it.

yes, I already try that, but message like that. n_components is greater than the available dimension. It is corrected to 2

Were you using three dimensions for UMAP in Seurat? You can check the umap coordinates you saved from Seurat (stored in adata.obsm['umap_cell_embeddings']). It seems you did not have 3D UMAP in Seurat.

okay I'll check again. Thanks.