hms-dbmi / scde

R package for analyzing single-cell RNA-seq data
http://pklab.med.harvard.edu/scde
Other
170 stars 64 forks source link

Possible confusion between log10 and log for scde.expression.magnitude #22

Closed jenzopr closed 8 years ago

jenzopr commented 8 years ago

Hi Jean,

recently, I plotted the scde.expression.magnitude for some marker genes in our single-cell experiments. Naturally, I labelled the x-axis as log10(FPM), but observed fpm values higher than 8 made me think. The scde.expression.magnitude function returns a numeric from the log() function, which calculates the natural logarithm of a number. In some of your examples (http://hms-dbmi.github.io/scde/diffexp.html, see "More detailed functions"), you also have log10 in the axis label - I hope those values are not confused with the logs from o.fpm object created.

Best, Jens :)

JEFworks commented 8 years ago

Hi Jens,

Yes, you are correct in that scde.expression.magnitudes returns a natural log.

I can see where this gets a bit confusing. o.prior is actually in log10, so we use log((10^o.prior$x)-1) to transform it back into natural log, since natural log is used to compute the failure probabilities. When plotting though, we're actually plotting lines(x = o.prior$x, ...) so the x axis is indeed in log10.

Hope that clears thing up!

Best, Jean

jenzopr commented 8 years ago

Hi Jean,

thanks for the clarification. So to get log10(FPM)-values, we now calculate fpm = log10(exp(o.fpm)).

Did I get your message right, that for the failure probability when using with magnitudes, its also important to have them in a log() range? E.g. to obtain the correct failure probabilities fails = scde.failure.probability(knn, magnitudes = log(10^fpm))?

Thanks a lot!

JEFworks commented 8 years ago

Hi Jens,

Yes, exactly; I would recommend adding a pseudocount of 1 just to prevent the -Infs: fpm = log10(exp(o.fpm)+1)

Right, just remember to transform keeping the zeros in mind: fails = scde.failure.probability(knn, magnitudes = log(10^fpm - 1)). You can also get the failures associated with counts instead of fpm magnitudes via scde.failure.probability(models = knn, counts = cd)

Best, Jean

jenzopr commented 8 years ago

Perfect, thanks :)