theislab / ncem

Learning cell communication from spatial graphs of cells
https://ncem.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
102 stars 13 forks source link

How to run NCEM for own spatial data? #107

Open ankitbioinfo opened 2 years ago

ankitbioinfo commented 2 years ago

Hi Anna,

I want to use NCEM for other MERFISH data to compare with other tools that how the information differs. I followed your tutorial https://github.com/theislab/spatial_scog_workshop_2022/blob/main/ncem/tutorial_ncem.ipynb but I don't understand how to create customLoader part for my own data. Compare to your anndata [ad = sq.datasets.mibitof()]structure.

AnnData object with n_obs × n_vars = 3309 × 36     obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'     var: 'mean-0', 'std-0', 'mean-1', 'std-1', 'mean-2', 'std-2'     uns: 'Cluster_colors', 'batch_colors', 'neighbors', 'spatial', 'umap'     obsm: 'X_scanorama', 'X_umap', 'spatial'     obsp: 'connectivities', 'distances'

My anndata object have expression matrix in adata.X, cluster annotation in obs: 'clusters' and spatial coordinate in obsm: 'spatial' That's all I have in anndata. With this information could I able to repeat the tutorial for ncem.sender_similarity_analysis? Thanks.

AnnaChristina commented 2 years ago

Hi @ankitbioinfo,

yes you can run the analysis on your own data by just switching the tutorial code to

ncem.data = customLoader(
    adata=ad, cluster='clusters', patient=None, library_id=None, radius=52
)

and by selecting a radius that is suited for the interactions you want to analyze in your data. The ideal radius can also be obtained by running an ablation study over multiple resolutions and finding the best performing radius for your dataset.

I hope this answers your question.

ankitbioinfo commented 2 years ago

Hi @AnnaChristina,

Thank you for the answer. I ran with following command as you suggested.

adata_file='spatial_all_CSR.h5ad'
ad = an.read_h5ad(adata_file)
print(ad)
ncem = InterpreterInteraction()
print('step 1 done')
adata_vis = ad
print('step 2 done')
ncem.data = customLoader(adata=adata_vis, cluster='clusters', patient='None', library_id='None', radius=25)
print('step 3 done')

And got the following output.

AnnData object with n_obs × n_vars = 393286 × 346
    obs: 'umi_sct', 'log_umi_sct', 'gene_sct', 'log_gene_sct', 'umi_per_gene_sct', 'log_umi_per_gene_sct', 'clusters'
    var: 'Intercept_sct', 'log_umi_sct', 'theta_sct', 'Intercept_step1_sct', 'log_umi_step1_sct', 'dispersion_step1_sct', 'genes_step1_sct', 'log10_gmean_sct'
    uns: 'spatial'
    obsm: 'spatial'
step 1 done
step 2 done
Loading data from raw files
registering celldata
Traceback (most recent call last):
  File "/Users/agrawal/miniconda3/envs/ncem_new/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'None'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "ncem_run.py", line 35, in <module>
    ncem.data = customLoader(adata=adata_vis, cluster='clusters', patient='None', library_id='None', radius=25)
  File "/Users/agrawal/miniconda3/envs/ncem_new/lib/python3.7/site-packages/ncem/data.py", line 1831, in __init__
    self.register_celldata(n_top_genes=n_top_genes)
  File "/Users/agrawal/miniconda3/envs/ncem_new/lib/python3.7/site-packages/ncem/data.py", line 1750, in register_celldata
    self._register_celldata(n_top_genes=n_top_genes)
  File "/Users/agrawal/miniconda3/envs/ncem_new/lib/python3.7/site-packages/ncem/data.py", line 1875, in _register_celldata
    for p in np.unique(celldata.obs[self.patient]):
  File "/Users/agrawal/miniconda3/envs/ncem_new/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/agrawal/miniconda3/envs/ncem_new/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'None'

I am still not sure what causing the above error. Thank you.

giovp commented 3 months ago

hi @ankitbioinfo I believe the suggested code has an error

ncem.data = customLoader(...)

should be

from ncem.data import customLoader
loader = customLoader(...)