tanaylab / metacell

Metacell - Single-cell mRNA Analysis
https://tanaylab.github.io/metacell
Other
108 stars 30 forks source link

inconsistency of geometric mean calculation in paper and in code #44

Closed mingwhy closed 3 years ago

mingwhy commented 3 years ago

Hi, Thank you for this amazing tool! I was reading the published paper in Genome Biology and noticed in the 'Methods' section->'Defining the metacell gene expression profile' part, the equation for calculating a regularized geometric mean for gene expression intensity with each metacell is:

Screen Shot 2021-01-06 at 10 39 55 PM

while when i came to the R package code for this part, I hope I located it right, the mc_compute_fp function in mc.r script. it goes like this: clust_geomean = t(tgs_matrix_tapply(us[f_g_cov,], mc@mc, function(y) {exp(mean(log(1+y)))-1}))

It looks like the minus 1 part is different between source code and the equation in the main text, in the source code, '-1' is done after exp(...), while in the main text method, '-1' is inside exp, i.e., exp(... -1 ).

I wonder if it's a typo in the published paper or there is something wrong with the code. Bests! Ming

amostanay commented 3 years ago

Good catch. The code is correct...(you want to compensate for adding 1 in the log - such that doing this regularized geometric mean on all 0's will give you a 0 back...