theislab / single-cell-best-practices

https://www.sc-best-practices.org
https://www.sc-best-practices.org
Other
784 stars 189 forks source link

Switch enrichment analysis to decoupler #120

Open PauBadiaM opened 1 year ago

PauBadiaM commented 1 year ago

Enrichment analysis

Hi, cool initiative! This will definitely be helpful for the community and for teaching in the university.

I'm Pau, a PhD student from saezlab. In our group we specialize in enrichment analysis in different medical studies. I was reading chapter 18. Gene set enrichment and pathway analysis and noticed that it could be improved.

I have developed decoupler, an efficient python framework containing multiple statistical methods for enrichment analysis and prior knowledge resources. It is written completely in python and is part of the scverse ecosystem. Using any of the methods inside of it (10+), including GSEA and AUCell among other more recent ones, it would be cleaner instead of having to go back and forth from R to python. Moreover, decoupler contains extensive documentation: single-cell, pseudobulk, spatial and bulk functional analysis, which could be adapted to be used in the book.

Happy to contribute!

Zethson commented 1 year ago

Hey,

great timing! @ivirshup is currently already on this with this PR :) https://github.com/theislab/single-cell-best-practices/pull/119

Maybe the two of you can coordinate?

ivirshup commented 1 year ago

Ah was just thinking about pinging you @PauBadiaM, I'll write up a message over on that PR

Zethson commented 1 year ago

Moreover, decoupler contains extensive documentation: single-cell, pseudobulk, spatial and bulk functional analysis, which could be adapted to be used in the book.

If you think that more of decoupler's functionality could be useful somewhere (pseudobulking maybe for the DE chapter?) I'd be happy to receive a few pointers that we could discuss. If we agree, I'd also happily merge PRs then.

Thank you very much!

PauBadiaM commented 1 year ago

Hi @Zethson!

The pseudobulking is quite straight forward, here's the function's documentation, and an example vignette of it. If you find it useful I could add it to the DE chapter.

Zethson commented 1 year ago

Hi @Zethson!

The pseudobulking is quite straight forward, here's the function's documentation, and an example vignette of it. If you find it useful I could add it to the DE chapter.

Yeah, if you can reproduce the results with your pseudobulk implementation I'd be happy if we could use yours! I'd appreciate a PR.

ivirshup commented 1 year ago

Does decoupler's pseudobulking function subsample per group to generate replicates? I don't think I see an argument for that.

The method I used also uses a fixed number of cells per pseudobulb which seems to have removed some "library size" effects caused by cell type abundance.

PauBadiaM commented 1 year ago

The function in decoupler doesn't subsample, instead, it asks for the sample id (the true replicates) and a group label to generate sample and cell type specific pseudobulk profiles. In my opinion, if a dataset doesn't contain true replicates (multiple samples) it shouldn't be pseudobulked (sampled cells will always have confounding factors and are not true replicates). If this is the case, then its better to not use decoupler's function (or if needed switch to another dataset with multiple samples).

ivirshup commented 1 year ago

Even if you don't generate multiple pseudo-samples per bio-sample cell-population group, would you still want to sum across all cells in the group given differences in cell-population abundance? I'd think you might want to take a fixed number of cells per group.

In my opinion, if a dataset doesn't contain true replicates (multiple samples) it shouldn't be pseudobulked

Fair opinion, I guess you could treat them as technical replicates. I wonder how much variability these have.