Open PauBadiaM opened 1 year ago
Hey,
great timing! @ivirshup is currently already on this with this PR :) https://github.com/theislab/single-cell-best-practices/pull/119
Maybe the two of you can coordinate?
Ah was just thinking about pinging you @PauBadiaM, I'll write up a message over on that PR
Moreover, decoupler contains extensive documentation: single-cell, pseudobulk, spatial and bulk functional analysis, which could be adapted to be used in the book.
If you think that more of decoupler's functionality could be useful somewhere (pseudobulking maybe for the DE chapter?) I'd be happy to receive a few pointers that we could discuss. If we agree, I'd also happily merge PRs then.
Thank you very much!
Hi @Zethson!
The pseudobulking is quite straight forward, here's the function's documentation, and an example vignette of it. If you find it useful I could add it to the DE chapter.
Hi @Zethson!
The pseudobulking is quite straight forward, here's the function's documentation, and an example vignette of it. If you find it useful I could add it to the DE chapter.
Yeah, if you can reproduce the results with your pseudobulk implementation I'd be happy if we could use yours! I'd appreciate a PR.
Does decoupler's pseudobulking function subsample per group to generate replicates? I don't think I see an argument for that.
The method I used also uses a fixed number of cells per pseudobulb which seems to have removed some "library size" effects caused by cell type abundance.
The function in decoupler
doesn't subsample, instead, it asks for the sample id (the true replicates) and a group label to generate sample and cell type specific pseudobulk profiles. In my opinion, if a dataset doesn't contain true replicates (multiple samples) it shouldn't be pseudobulked (sampled cells will always have confounding factors and are not true replicates). If this is the case, then its better to not use decoupler
's function (or if needed switch to another dataset with multiple samples).
Even if you don't generate multiple pseudo-samples per bio-sample cell-population group, would you still want to sum across all cells in the group given differences in cell-population abundance? I'd think you might want to take a fixed number of cells per group.
In my opinion, if a dataset doesn't contain true replicates (multiple samples) it shouldn't be pseudobulked
Fair opinion, I guess you could treat them as technical replicates. I wonder how much variability these have.
Enrichment analysis
Hi, cool initiative! This will definitely be helpful for the community and for teaching in the university.
I'm Pau, a PhD student from saezlab. In our group we specialize in enrichment analysis in different medical studies. I was reading chapter 18. Gene set enrichment and pathway analysis and noticed that it could be improved.
I have developed decoupler, an efficient python framework containing multiple statistical methods for enrichment analysis and prior knowledge resources. It is written completely in python and is part of the scverse ecosystem. Using any of the methods inside of it (10+), including GSEA and AUCell among other more recent ones, it would be cleaner instead of having to go back and forth from R to python. Moreover,
decoupler
contains extensive documentation: single-cell, pseudobulk, spatial and bulk functional analysis, which could be adapted to be used in the book.Happy to contribute!