saezlab / decoupler-py

Python package to perform enrichment analysis from omics data.
https://decoupler-py.readthedocs.io/
GNU General Public License v3.0
145 stars 21 forks source link

copy.deepcopy() in get_contrast() #107

Closed wan-nie closed 3 months ago

wan-nie commented 4 months ago

Describe the bug I found the issue when I played with get_contrast() in sc-best-practices.

>>> pdata
AnnData object with n_obs × n_vars = 110 × 15701
    obs: 'condition', 'cell_type', 'patient', 'sample', 'psbulk_n_cells', 'psbulk_counts', 'lib_size'
    var: 'name', 'n_cells'
    uns: 'log1p'
    layers: 'psbulk_props', 'counts'
#condition: categorical values; {'ctrl', 'stim'}.
#cell_type: categorical values; {'B cells', ..., 'NK cells'}.

>>> logFCs, pvals = dc.get_contrast(
    pdata,
    group_col="cell_type",
    condition_col="condition",
    condition="stim",
    reference="ctrl",
    method="t-test",
)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[213], line 1
----> 1 logFCs, pvals = dc.get_contrast(
      2     pdata,
      3     group_col="cell_type",
      4     condition_col="condition",
      5     condition="stim",
      6     reference="ctrl",
      7     method="t-test",
      8 )
File ~/miniconda3/envs/estp_analysis/lib/python3.10/site-packages/decoupler/utils_anndata.py:512, in get_contrast(adata, group_col, condition_col, condition, reference, method)
    508 for grp in groups:
    509 
    510     # Sub-set by group
    511     sub_adata = copy.deepcopy(adata[adata.obs[group_col] == grp])
--> 512     sub_adata.obs = sub_adata.obs[[condition_col]]
    514     # Transform string columns to categories (removes anndata warnings)
    515     if sub_adata.obs[condition_col].dtype == 'object' or sub_adata.obs[condition_col].dtype == 'category':

...

ValueError: Value passed for key 'psbulk_props' is of incorrect shape. Values of layers must match dimensions (0, 1) of parent. Value had shape (16, 15701) while it should have had (110, 15701).

This is related to the following code: https://github.com/saezlab/decoupler-py/blob/6539a6e044a4cb8ec70662027d900d4d067b3324/decoupler/utils_anndata.py#L511

I checked the copy.deepcopy part and found that it returns a view of the original AnnData object rather than the expected sliced copy.

>>> pdata[pdata.obs[group_col] == grp]
View of AnnData object with n_obs × n_vars = 16 × 15701
    obs: 'condition', 'cell_type', 'patient', 'sample', 'psbulk_n_cells', 'psbulk_counts', 'lib_size'
    var: 'name', 'n_cells'
    uns: 'log1p'
    layers: 'psbulk_props', 'counts'

>>> copy.deepcopy(pdata[pdata.obs[group_col] == grp])
View of AnnData object with n_obs × n_vars = 110 × 15701
    obs: 'condition', 'cell_type', 'patient', 'sample', 'psbulk_n_cells', 'psbulk_counts', 'lib_size'
    var: 'name', 'n_cells'
    uns: 'log1p'
    layers: 'psbulk_props', 'counts'

This could result in get_contrast() not producing a sliced sub_adata as anticipated. I am new to AnnData, and not sure if this is an intended feature.

To Reproduce Run the notebook in sc-best-practices.

System Python 3.10 scanpy 1.9.6 anndata 0.10.3 decoupler 1.6.0

PauBadiaM commented 4 months ago

Hi @wan-nie,

Thanks for reporting this, indeed it was a bug I introduced in the last version. I removed the deepcopy for a standard one and then it works, you can install the latest version with the fix by running this line:

pip install git+https://github.com/saezlab/decoupler-py

However, I would advice on not running the simple t-test we showed in the book and rather use a proper differential testing framework such as DESeq2. You can find a complete vignette showing how to infer cell-cell communication events with DEA in this vignette from our liana+ docs. Hope this is helpful!

wan-nie commented 4 months ago

Hi @PauBadiaM,

Thank you for the prompt reply and helpful suggestions on CCC inference. I believe this issue has been resolved now.