Model suggestions for 550 group project

tom-hc-park / STAT550-450-for-Seniorworkers-from-Korea

0 stars 0 forks source link

Model suggestions for 550 group project #21

Open tom-hc-park opened 6 years ago

tom-hc-park commented 6 years ago

Zero inflated gamma and log normal model

I assume that if our data can approximate ratio data, we can use zero inflated gamma and log normal.

Other good candidate model is "Inflated Discrete Beta Regression Models for Likert and Discrete Rating Scale Outcomes" which seems nice so far.

@NSKrstic @xinyaofan

https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/tree/master/Resources/550%20group%20project

NSKrstic commented 6 years ago

I've also added what I previously researched regarding the treatment of Likert scales as an interval response. Generally, parametric statistics can still be applied to Likert scales (even though the assumptions are violated) because of their robustness. This is especially considering that our response variables are not from a single Likert scale item, but the sum of many Likert scales items.

https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/blob/master/Resources/550%20group%20project/Likert_Scales_Statistics.pdf https://github.com/aiod01/STAT550-450-for-Seniorworkers-from-Korea/blob/master/Resources/550%20group%20project/i1949-8357-5-4-541.pdf

xinyaofan commented 6 years ago

Thank for you guys! Since we all think beta regression is a good model to use , I put the link of beta regression in R which is provided by Nikolas here. And also a reference I think is useful to know more about beta regression. https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf https://arxiv.org/abs/1405.4637

xinyaofan commented 6 years ago

beta.pdf This is an article talk about beta-regression. (More about the theoretical and model testing.) Hope it can be helpful.

xinyaofan commented 6 years ago

@NSKrstic Since I can't find some general criteria for goodness-of-fit of our model. I think we can do some prediction to evaluate our model. You can use 1000 data to fit the model, then I use the remaining 200 data to do some prediction. Or we can even do a cross-validation. Does it make sense? Xinyao