saezlab / decoupler-py

Python package to perform enrichment analysis from omics data.
https://decoupler-py.readthedocs.io/
GNU General Public License v3.0
154 stars 23 forks source link

dc.get_pseudobulk needs to use '.todense()' instead of '.A' #146

Closed jhaberbe closed 2 months ago

jhaberbe commented 2 months ago

Describe the bug When computing pseudobulk using the following code:

counts = dc.get_pseudobulk(
    subset,
    sample_col='specimen',
    groups_col='cell_type',
    layer='counts',
    mode='sum',
    min_cells=0,
    min_counts=0
)

I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[34], line 2
      1 import decoupler as dc
----> 2 counts = dc.get_pseudobulk(
      3     subset,
      4     sample_col='specimen',
      5     groups_col='cell_type',
      6     layer='counts',
      7     mode='sum',
      8     min_cells=0,
      9     min_counts=0
     10 )

File [/oak/stanford/projects/kibr/Reorganizing/Projects/4Jul2024_PBMC_ONT/.venv/lib/python3.12/site-packages/decoupler/utils_anndata.py:380](https://vscode-remote+ondemand-002esherlock-002estanford-002eedu.vscode-resource.vscode-cdn.net/oak/stanford/projects/kibr/Reorganizing/Projects/4Jul2024_PBMC_ONT/.venv/lib/python3.12/site-packages/decoupler/utils_anndata.py:380), in get_pseudobulk(adata, sample_col, groups_col, obs, layer, use_raw, mode, min_cells, min_counts, dtype, skip_checks, min_prop, min_smpls, remove_empty)
    377     layers['psbulk_props'] = props
    378 elif type(mode) is str or callable(mode):
    379     # Compute psbulk
--> 380     psbulk, ncells, counts, props = compute_psbulk(n_rows, n_cols, X, sample_col, groups_col, smples, groups, obs,
    381                                                    new_obs, min_cells, min_counts, mode, dtype)
    382     layers = {'psbulk_props': props}
    384 # Add QC metrics

File [/oak/stanford/projects/kibr/Reorganizing/Projects/4Jul2024_PBMC_ONT/.venv/lib/python3.12/site-packages/decoupler/utils_anndata.py:264](https://vscode-remote+ondemand-002esherlock-002estanford-002eedu.vscode-resource.vscode-cdn.net/oak/stanford/projects/kibr/Reorganizing/Projects/4Jul2024_PBMC_ONT/.venv/lib/python3.12/site-packages/decoupler/utils_anndata.py:264), in compute_psbulk(n_rows, n_cols, X, sample_col, groups_col, smples, groups, obs, new_obs, min_cells, min_counts, mode, dtype)
    262 profile = X[(obs[sample_col] == smp) & (obs[groups_col] == grp)]
    263 if isinstance(X, csr_matrix):
--> 264     profile = profile.A
    266 # Skip if few cells or not enough counts
    267 ncell = profile.shape[0]

AttributeError: 'SparseCSRView' object has no attribute 'A'

Expected behavior I expect pseudobulking.

System NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Additional context The problem is just that newer versions of anndata don't seem to like using the .A accessor, and instead want to use the .todense accessor. When I went into the module code and changed the accessor to '.todense()', it fixed the problem.

PauBadiaM commented 2 months ago

Hi @jhaberbe,

Indeed, the new update of scipy has deprecated the use of .A, see https://github.com/saezlab/decoupler-py/issues/139#issuecomment-2202925991. I made a quick patch to fix it that can be installed running:

pip install git+https://github.com/saezlab/decoupler-py.git

Hope this is helpful!

jhaberbe commented 2 months ago

Oh shoot my B, I'm three weeks late. Thanks!