What goes into metacells.X?

This is always the sum of UMIs of the cells and never an average.

That said, we cap each cell's total UMIs at 2median(total-UMIs-of-cells-in-MC) to avoid the case where a single huge cell controls the result - this IS AN ISSUE in data sets where there is large variance between cell depths. for the too-large cells we normalize their UMIs so the cell's total is at the 2median cap. This will give non-integer values but it is not an average, and it has nothing to do with specific genes.

X can and should be used for downstream analysis. One thing which is often done is normalize it so the total in each cell is 1 (to get fraction-of-gene-in-cell), which removes the information about the (adjusted) depth of the metacells but makes it easy to compare the gene profiles. If you are doing something like correlations or PCA than such normalization is not necessary (not that we believe in global PCA too much).

That said, you have all the information (in the cells data, metacell per-obs attribute identifying which metacell each cell belongs to, in the metacells data, grouped per-obs attribute specifying how many cells are grouped into each metacell), so you can compute anything else you want from the raw cell UMIs.

tanaylab / metacells

What goes into metacells.X? #10