saezlab / decoupler-py

Python package to perform enrichment analysis from omics data.
https://decoupler-py.readthedocs.io/
GNU General Public License v3.0
157 stars 23 forks source link

Pseudobulk for each sample #89

Closed SNOL2 closed 10 months ago

SNOL2 commented 10 months ago

Hi, Thanks for developing this wonderful tool! I'd like to perform pseudobulk analyses for each sample, i.e. one expression vector for one sample. Considering the tumor purity, I suppose it is necessary to assign the weight to each cell type(especially malignant cells) when computing the sum of the counts. Could you please give me some advice? Thanks again!

PauBadiaM commented 10 months ago

Hi @SNOL2

Thanks for using it! To perform pseudobulk at the sample level (not at the sample-cell type level) you can set the groups_col to None like this:

pdata = dc.get_pseudobulk(
    adata=adata,
    sample_col='sample_id',  # Here your sample id column name
    groups_col=None,
    ...
)

The function decoupler.get_pseudobulk also accepts custom functions for the mode (which by default is 'sum'). You could append your weight as a feature in adata and then write a custom function that takes this feature to do a weighted sum of your cells for example. Hope this is helpful!

SNOL2 commented 10 months ago

I will give it a try. Thanks!