willtownes / glmpca

GNU Lesser General Public License v3.0
92 stars 10 forks source link

GLMPCA on lognormal distributed data #9

Closed JauntyJJS closed 4 years ago

JauntyJJS commented 5 years ago

Hi,

I would like to ask if glmpca is tried on data in which most of the features are lognormal distributed. If so, what are the settings that needs to be used ?

willtownes commented 5 years ago

Hi thanks for your question. If your data are truly lognormal distributed, there is no need to use GLM-PCA. Instead, just log transform and apply standard PCA (which may be considered a faster way of solving GLM-PCA with a Gaussian likelihood). However, many so-called lognormal data are better described by count distributions like Poisson or negative binomial (which GLM-PCA provides). For example, lognormal data cannot contain any exact zeros. With that in mind, could you tell us more about the characteristics of your data? What is the minimum and maximum value? Are the values integers or decimal valued? How was the data collected?

JauntyJJS commented 5 years ago

Hi,

Thank you for the clarification.

The characteristics of the data I am dealing with is from Liquid Chromatography Mass Spectrometry in which the values are the area under the peak.

In general, The minimum is zero, the max could be 10^9. Values are integers since the machine could not integrate the area in places for now. However, some transitions or features can have a peak area range from 1000 to 10000 others can range from 10^5 to 10^7.

I am just unsure if a count distribution can be used because they are integers.

willtownes commented 4 years ago

OK thanks for describing your data. While GLM-PCA was really developed for scRNA-seq with UMIs, and hasn't been thoroughly evaluated on other data types, I see no reason why you shouldn't give it a try on your LC/MS data. I am not very familiar with metabolomics but I believe @rmflight has expertise and is familiar with GLM-PCA's pros and cons, perhaps he can give you some advice. In general we welcome application of GLM-PCA to any interesting data where the noise matches one of our likelihoods (in your case, if non-negative integer-valued data, either "poi" or "nb" should be reasonable). Please let us know how it works out and/or if you have any difficulties so we can make the software better.

willtownes commented 4 years ago

No activity on this issue so closing it out.