scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
440 stars 81 forks source link

moran's I scaled data #504

Closed wangjiawen2013 closed 2 years ago

wangjiawen2013 commented 2 years ago

Hi, the scaled data is in adata.X, can I use this with sq.gr.spatial_autocorr(adata, mode="moran") ? And, when I used sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran"), an error occurred: RuleException: ValueError in line 75 of /home/wangjw/data/work/squidpy/stereoseq.smk: Shape of passed values is (26178, 3), indices imply (3000, 3). I stored the normalized data in adata.raw and scaled data in adata.X, 3000 means the top3000 highly variable genes. 26178 is the full gene numbers. How can I compute moran's I of the 3000 highly variable genes with normalized data in adata.raw ?

giovp commented 2 years ago

hi @wangjiawen2013 , you can definitely use Moran's I on normalized and scaled data. if you want to run them on the 3k HVG I'd suggest you to remove the use_raw=False flag.

Best, Giovanni

wangjiawen2013 commented 2 years ago

Thanks for your help! I want to run them on all the genes stored in adata.raw, but when I used sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran"), an error occurred: RuleException: ValueError in line 75 of /home/wangjw/data/work/squidpy/stereoseq.smk: Shape of passed values is (26178, 3), indices imply (3000, 3).

michalk8 commented 2 years ago

@wangjiawen2013 could you please post the full traceback for us to look at?

wangjiawen2013 commented 2 years ago

In [4]: adata Out[4]: AnnData object with n_obs × n_vars = 5143 × 26178 obs: 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'classification' var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std' uns: 'draw_graph', 'hvg', 'leiden', 'leiden_co_occurrence', 'leiden_colors', 'leiden_nhood_enrichment', 'leiden_ripley_L', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap' obsm: 'X_draw_graph_fa', 'X_pca', 'X_umap', 'spatial' varm: 'PCs' layers: 'counts' obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

In [5]: sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1701, in create_block_manager_from_arrays(arrays, names, axes)
   1700 try:
-> 1701     blocks = _form_blocks(arrays, names, axes)
   1702     mgr = BlockManager(blocks, axes)

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1759, in _form_blocks(arrays, names, axes)
   1758 if len(items_dict["FloatBlock"]):
-> 1759     float_blocks = _multi_blockify(items_dict["FloatBlock"])
   1760     blocks.extend(float_blocks)

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1852, in _multi_blockify(tuples, dtype)
   1850 for dtype, tup_block in grouper:
-> 1852     values, placement = _stack_arrays(list(tup_block), dtype)
   1854     block = make_block(values, placement=placement, ndim=2)

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1882, in _stack_arrays(tuples, dtype)
   1881 for i, arr in enumerate(arrays):
-> 1882     stacked[i] = _asarray_compat(arr)
   1884 return stacked, placement

ValueError: could not broadcast input array from shape (3000,) into shape (26178,)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran")

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/squidpy/gr/_ppatterns.py:183, in spatial_autocorr(adata, connectivity_key, genes, mode, transformation, n_perms, two_tailed, corr_method, layer, seed, use_raw, copy, n_jobs, backend, show_progress_bar)
    180 results = {params["stat"]: score}
    181 results.update(pval_results)
--> 183 df = pd.DataFrame(results, index=genes)
    185 if corr_method is not None:
    186     for pv in filter(lambda x: "pval" in x, df.columns):

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/frame.py:529, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    524     mgr = self._init_mgr(
    525         data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
    526     )
    528 elif isinstance(data, dict):
--> 529     mgr = init_dict(data, index, columns, dtype=dtype)
    530 elif isinstance(data, ma.MaskedArray):
    531     import numpy.ma.mrecords as mrecords

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/construction.py:287, in init_dict(data, index, columns, dtype)
    281     arrays = [
    282         arr if not isinstance(arr, ABCIndexClass) else arr._data for arr in arrays
    283     ]
    284     arrays = [
    285         arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    286     ]
--> 287 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/construction.py:95, in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
     92 # from BlockManager perspective
     93 axes = [columns, index]
---> 95 return create_block_manager_from_arrays(arrays, arr_names, axes)

File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1706, in create_block_manager_from_arrays(arrays, names, axes)
   1704     return mgr
   1705 except ValueError as e:
-> 1706     raise construction_error(len(arrays), arrays[0].shape, axes, e)

ValueError: Shape of passed values is (26178, 3), indices imply (3000, 3)
michalk8 commented 2 years ago

The fix is now on master, you can try installing it as pip install git+https://github.com/theislab/squidpy@master.

wangjiawen2013 commented 2 years ago

The bug has been fixed now, but I want to run moranI on all the genes, not just highly variable genes. Now only highly variables genes can be computed, even though I set use_raw=True.

michalk8 commented 2 years ago

The bug has been fixed now, but I want to run moranI on all the genes, not just highly variable genes. Now only highly variables genes can be computed, even though I set use_raw=True.

By default HVGs are selected, if you want to run on all genes in adata.raw, try running

sq.gr.spatial_autocorr(adata, genes=adata.raw.var_names, use_raw=True, mode="moran")