tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

Define number of Metacells #60

Open antonio-miranda opened 11 months ago

antonio-miranda commented 11 months ago

Hello, thank you for the package. Is there a way in the pipeline to define the number of metacells we want to obtain?

orenbenkiki commented 11 months ago

Not directly, as the algorithm tries to create "coherent" metacells using all sort of criteria.

Indirectly - there are two parameters, target_metacell_size (default 48) and as of 0.9.3 also target_metacell_umis (default 160K). The algorithm tries to hit these targets, allowing for 0.5x - 2x range (and at least 12 cells per metacell no matter what).

The idea being that a "good" metacell has "enough" cells and "enough" UMIs to robustly sample the cell state, but not too many cells and UMIs so that it only samples one cell staate (and not a few similar ones) - that is, a tradeoff between robustness and sensitivity.

The algorithm is adaptive (it tries to hit both targets as much as possible), and if your cells are ~few K UMIs each, this works out reasonably well. If your cells are very small (few 100 UMIs) or very large (few 10K UMIs) you may want to adjust the targets, which will indirectly change the number of metacells.

Also, for smaller data sets, one may want to decreases the targets to get better sensitivity, accepting the fact that metacells for small number of cells won't be that robust whatever we do.

Can you say something about the # of UMIs per cell and the number of cells you have, and what # of metacells you got? We are always looking for ways to improve the out-of-the-box algorithm to minimize the need for users to tweak the parameters.