Closed hurleyLi closed 3 years ago
This is always the sum of UMIs of the cells and never an average.
That said, we cap each cell's total UMIs at 2median(total-UMIs-of-cells-in-MC) to avoid the case where a single huge cell controls the result - this IS AN ISSUE in data sets where there is large variance between cell depths. for the too-large cells we normalize their UMIs so the cell's total is at the 2median cap. This will give non-integer values but it is not an average, and it has nothing to do with specific genes.
X can and should be used for downstream analysis. One thing which is often done is normalize it so the total in each cell is 1 (to get fraction-of-gene-in-cell), which removes the information about the (adjusted) depth of the metacells but makes it easy to compare the gene profiles. If you are doing something like correlations or PCA than such normalization is not necessary (not that we believe in global PCA too much).
That said, you have all the information (in the cells data, metacell
per-obs attribute identifying which metacell each cell belongs to, in the metacells data, grouped
per-obs attribute specifying how many cells are grouped into each metacell), so you can compute anything else you want from the raw cell UMIs.
Hi again! Could you please explain a little bit more of what goes into the final
metacells.X
after runningmc.pl.collect_metacells(clean)
? It seems that within the same Metacells run, sometimes it is the sum of all the UMI from cells within each Metacell, but sometimes it seems to be the mean value of that sum, even for the same gene (different metacells). So the resultingmetacells.X
can't be directly used for downstream analyses. Could you please clarify? Thanks! Hurley