Closed wangjiawen2013 closed 2 years ago
hi @wangjiawen2013 , you can definitely use Moran's I on normalized and scaled data. if you want to run them on the 3k HVG I'd suggest you to remove the use_raw=False
flag.
Best, Giovanni
Thanks for your help! I want to run them on all the genes stored in adata.raw, but when I used sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran"), an error occurred: RuleException: ValueError in line 75 of /home/wangjw/data/work/squidpy/stereoseq.smk: Shape of passed values is (26178, 3), indices imply (3000, 3).
@wangjiawen2013 could you please post the full traceback for us to look at?
In [4]: adata Out[4]: AnnData object with n_obs × n_vars = 5143 × 26178 obs: 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'classification' var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std' uns: 'draw_graph', 'hvg', 'leiden', 'leiden_co_occurrence', 'leiden_colors', 'leiden_nhood_enrichment', 'leiden_ripley_L', 'moranI', 'neighbors', 'pca', 'spatial', 'spatial_neighbors', 'umap' obsm: 'X_draw_graph_fa', 'X_pca', 'X_umap', 'spatial' varm: 'PCs' layers: 'counts' obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
In [5]: sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1701, in create_block_manager_from_arrays(arrays, names, axes)
1700 try:
-> 1701 blocks = _form_blocks(arrays, names, axes)
1702 mgr = BlockManager(blocks, axes)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1759, in _form_blocks(arrays, names, axes)
1758 if len(items_dict["FloatBlock"]):
-> 1759 float_blocks = _multi_blockify(items_dict["FloatBlock"])
1760 blocks.extend(float_blocks)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1852, in _multi_blockify(tuples, dtype)
1850 for dtype, tup_block in grouper:
-> 1852 values, placement = _stack_arrays(list(tup_block), dtype)
1854 block = make_block(values, placement=placement, ndim=2)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1882, in _stack_arrays(tuples, dtype)
1881 for i, arr in enumerate(arrays):
-> 1882 stacked[i] = _asarray_compat(arr)
1884 return stacked, placement
ValueError: could not broadcast input array from shape (3000,) into shape (26178,)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran")
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/squidpy/gr/_ppatterns.py:183, in spatial_autocorr(adata, connectivity_key, genes, mode, transformation, n_perms, two_tailed, corr_method, layer, seed, use_raw, copy, n_jobs, backend, show_progress_bar)
180 results = {params["stat"]: score}
181 results.update(pval_results)
--> 183 df = pd.DataFrame(results, index=genes)
185 if corr_method is not None:
186 for pv in filter(lambda x: "pval" in x, df.columns):
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/frame.py:529, in DataFrame.__init__(self, data, index, columns, dtype, copy)
524 mgr = self._init_mgr(
525 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
526 )
528 elif isinstance(data, dict):
--> 529 mgr = init_dict(data, index, columns, dtype=dtype)
530 elif isinstance(data, ma.MaskedArray):
531 import numpy.ma.mrecords as mrecords
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/construction.py:287, in init_dict(data, index, columns, dtype)
281 arrays = [
282 arr if not isinstance(arr, ABCIndexClass) else arr._data for arr in arrays
283 ]
284 arrays = [
285 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
286 ]
--> 287 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/construction.py:95, in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
92 # from BlockManager perspective
93 axes = [columns, index]
---> 95 return create_block_manager_from_arrays(arrays, arr_names, axes)
File ~/programs/miniconda3/envs/spatial/lib/python3.8/site-packages/pandas/core/internals/managers.py:1706, in create_block_manager_from_arrays(arrays, names, axes)
1704 return mgr
1705 except ValueError as e:
-> 1706 raise construction_error(len(arrays), arrays[0].shape, axes, e)
ValueError: Shape of passed values is (26178, 3), indices imply (3000, 3)
The fix is now on master
, you can try installing it as pip install git+https://github.com/theislab/squidpy@master
.
The bug has been fixed now, but I want to run moranI on all the genes, not just highly variable genes. Now only highly variables genes can be computed, even though I set use_raw=True.
The bug has been fixed now, but I want to run moranI on all the genes, not just highly variable genes. Now only highly variables genes can be computed, even though I set use_raw=True.
By default HVGs are selected, if you want to run on all genes in adata.raw
, try running
sq.gr.spatial_autocorr(adata, genes=adata.raw.var_names, use_raw=True, mode="moran")
Hi, the scaled data is in adata.X, can I use this with sq.gr.spatial_autocorr(adata, mode="moran") ? And, when I used sq.gr.spatial_autocorr(adata, use_raw=True, mode="moran"), an error occurred: RuleException: ValueError in line 75 of /home/wangjw/data/work/squidpy/stereoseq.smk: Shape of passed values is (26178, 3), indices imply (3000, 3). I stored the normalized data in adata.raw and scaled data in adata.X, 3000 means the top3000 highly variable genes. 26178 is the full gene numbers. How can I compute moran's I of the 3000 highly variable genes with normalized data in adata.raw ?