Do you apply any transformation for the continuous trait in case it does not follow a normal distribution?

mgalardini / pyseer

SEER, reimplemented in python 🐍🔮

http://pyseer.readthedocs.io

Apache License 2.0

104 stars 25 forks source link

Do you apply any transformation for the continuous trait in case it does not follow a normal distribution? #136

Open neginmb opened 3 years ago

johnlees commented 3 years ago

We don't, no. Have a look at warpedlmm. If you want, you can take the warped phenotype output from that package and use it with pyseer

neginmb commented 3 years ago

Thanks for your answer, but what about the generalized linear model (GLM)? Do you apply any for the continuous trait? If so, what is the link function and the variance function?

johnlees commented 3 years ago

No, we don't fit a GLM. We just use OLS: https://github.com/mgalardini/pyseer/blob/master/pyseer/model.py#L284-L292

Statsmodels supports GLMs, so you could easily change this line if you wanted to use a different link or family: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html?highlight=glm#statsmodels.genmod.generalized_linear_model.GLM

The LMM is from limix/fastlmm, which I believe has a linear link and Gaussian variance. This is more difficult to modify, but the limix package has a few possible alternatives.

snowformatics commented 3 years ago

I have a similar situation and I am wondering whether I could use the mean calculated warped phenotypes of my repeated measurements with different environments instead of BLUEs? Any idea?

Thanks Stefanie

sydelstan commented 2 years ago

does the continuous trait need to be normally distributed? is there a test to confirm the assumptions of the OLS are met?

johnlees commented 2 years ago

does the continuous trait need to be normally distributed? is there a test to confirm the assumptions of the OLS are met?

@sydelstan OLS doesn't assume responses/traits are normally distributed, it assumes their errors/residuals are. You can plot lots of useful diagnostics in R if you did something like:

model <- glm(y ~ x, family=gaussian())
plot(model)

But doing this for all of your predictors is more difficult. I would generally just use warpedlmm.

johnlees commented 2 years ago

I have a similar situation and I am wondering whether I could use the mean calculated warped phenotypes of my repeated measurements with different environments instead of BLUEs? Any idea?

@snowformatics sorry I missed your message, from a while back I now see. This is beyond the scope of pyseer, certainly. Perhaps my intuition would be to set up a small simulation study in this case. But to use LMMs, the greater flexibility in limix, or even lme4, or a Bayesian/MCMC version (stan is probably not a bad idea) might be more helpful in more complex situations.

anhvu989 commented 9 months ago

Hello, Does the continuous phenotype need to be normally distributed when using LMM model? Because I read somewhere that the requirements for LMM model are phenotypes and residuals need to follow a normal distribution.

I am new to Pyseer and came across a publication (https://www.mdpi.com/2076-2607/10/7/1366) running Pyseer and it has residuals plotted and normality checked for satisfying assumptions for using LMM. Is it possible to extract residuals from Pyseer results for this plotting purpose?

Thank you very much.

johnlees commented 9 months ago

Hello, Does the continuous phenotype need to be normally distributed when using LMM model? Because I read somewhere that the requirements for LMM model are phenotypes and residuals need to follow a normal distribution.

I am new to Pyseer and came across a publication (https://www.mdpi.com/2076-2607/10/7/1366) running Pyseer and it has residuals plotted and normality checked for satisfying assumptions for using LMM. Is it possible to extract residuals from Pyseer results for this plotting purpose?

Thank you very much.

I think this is mostly covered in the comments above – I would recommend warpedlmm if you are worried about this but typically it's not likely to be a big problem. If you wanted residuals this would be a bit manual, but you could run phenotype prediction (see the docs) to get an idea about this