Core Statistics - Chapter 2

paroussisc commented 6 years ago

screenshot from 2018-10-03 10-44-30

Random effects can be used to share information between observations and across covariates. One example would be to treat weather effects on total goals as random effects. This means small-sample weather effect types could borrow information from the common mean and not be affected so much by small sample outliers.

One reason for using random effects is the interpretability of them. For example, it allows us to include patient effects in medical trials - modelling the random variability between patients while allowing inferences to be made about their shared covariates (treatment type, fat mass, height etc). Random effects to a Frequentist are what hierarchical modelling is to a Bayesian.

Linear mixed model:

screenshot from 2018-10-03 13-06-22

Non-linear random effects occur, for example, when the random effect is specified within a model that fully specifies a pdf. Integrating out the random effects does require some effort (stochastic simulation), however.

paroussisc commented 6 years ago

p-values and the likelihood ratio test

screenshot from 2018-10-03 14-49-59

A common test statistic is the likelihood ratio test:

screenshot from 2018-10-03 14-51-00

which, by the Neyman-Pearson Lemma, is the most powerful test possible for simple hypotheses.

There is some controversy surrounding the use of p-values, and this article gives a nice explanation: https://www.vox.com/science-and-health/2017/7/31/16021654/p-values-statistical-significance-redefine-0005, an excerpt:

In a 2013 PNAS paper, Johnson used more advanced statistical techniques to test the assumption researchers commonly make: that a p of .05 means there’s a 5 percent chance the null hypothesis is true. His analysis revealed that it didn’t. “In fact there’s a 25 percent to 30 percent chance the null hypothesis is true when the p-value is .05,” Johnson said.

Practical ways to avoid such issues are:

Concentrating on effect sizes (how big of a difference does an intervention make, and is it practically meaningful?)
Confidence intervals (what’s the range of doubt built into any given answer?)
Bayesian analysis — which we will encounter shortly

paroussisc commented 6 years ago

Model checking

Really used to try and reject models
QQ-plots
Standardized residual plots should appear independent with constant variance

Model Comparison

Want the "best" model
Can use AIC, which is derived from minimising the KL divergence from the true distribution
Can use Cross-Validation but this is far more computationally intensive

paroussisc commented 6 years ago

The Bayesian Approach

screenshot from 2018-10-03 16-02-00

It is worth noting that, as the sample size tends to infinity, the posterior distribution dominates the prior, and therefore parameter values corresponding to the posterior mode tend towards the maximum likelihood estimate.

Model Comparison

Bayes' factor, posterior model probabilties, and marginal likelihoods are all sensitive to the choice of prior
BIC does not have these difficulties, since it's basically a Bayes factor with the prior dropped
BIC relies on knowledge of posterior modes and also the number of free parameters (which is not always obvious in complicated models.
DIC does not have the above problems since it uses the effective degrees of freedom, which has a couple of different definitions. The EDoF need to be small compared to the number of data to justify using DIC
DIC is readily computable from simulation output and is similar to AIC
Only valid when the posterior distribution is approximately multivariate normal (large sample)

This link gives a nice succinct comparison of these model comparison techniques, with some resources linked to: https://www.stites.io/posts/2017-10-09-bic-dic-cv.html. What I take from it is that DIC is more useful than BIC, for the most part, the Cross Validation (which can also be used in a Bayesian setting) is the strongest, but more computationally intensive and doesn't have any theoretical claims on insights for future data.

paroussisc commented 6 years ago

In the MLE setting, under some regularity conditions, and in the large sample limit:

screenshot from 2018-10-03 17-47-57

screenshot from 2018-10-03 17-48-05

And in fact the follwing often holds, in the Bayesian setting, in the large sample limit:

screenshot from 2018-10-03 17-52-56

paroussisc / stats

Core Statistics - Chapter 2 #7