scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.89k stars 595 forks source link

sc.get.aggregate summary statistics #3063

Open grst opened 4 months ago

grst commented 4 months ago

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

It would be nice if sc.get.aggregate provided a way to compute summary statistics and put them in .obs of the aggregated AnnData object.

Most important would be for me to store the number of cells per aggregated sample, for being able to filter out samples below a certain threshold.

Not even sure if there are other metrics that are relevant, but in the most general case it would take a callback function.

flying-sheep commented 4 months ago

Sounds like a useful feature, you wanna do a PR?

grst commented 4 months ago

I don't think I would have the time in the near future

osmanmerdan commented 2 months ago

It would also be nice to add some stats for vars. Let's say we have aggregated cells from control and stim samples. For example, the percentage of cells expressing gene A in clusterX for every group sample would allow us to filter genes expressed in so few cells but with relatively high counts. It will reduce the false positives in pseudobulk differential gene expression analysis caused by these genes.

grst commented 2 months ago

Hi @osmanmerdan ,

this should be possible already now by using

sc.get.aggregate(..., func="count_nonzero")