paroussisc / stats

Repo to track the material I cover on Wednesdays
MIT License
1 stars 0 forks source link

Core Statistics - Chapter 4 #9

Open paroussisc opened 5 years ago

paroussisc commented 5 years ago

Properties of the expected Log-likelihood:

screenshot from 2018-10-10 13-45-10

Where does this proof use the fact that we're evaluating the expectation at the true parameter values?

EDIT: the derivative we're talking about is at the true param value so the expectation needs to be so that the two terms cancel.

This is as you'd expect really - the true parameters should be a turning point of the MLE, so should have derivative zero. It would be weird for maximising/minimising if this wasn't the case!

paroussisc commented 5 years ago

screenshot from 2018-10-10 14-18-36

screenshot from 2018-10-10 14-37-58

The estimated standard errors of the MLE are the square roots of the diagonal elements of the inverse of the observed Fisher information matrix (which is the negative of the Hessian from many optimisers: https://stats.stackexchange.com/questions/68080/basic-question-about-fisher-information-matrix-and-relationship-to-hessian-and-s).

paroussisc commented 5 years ago

screenshot from 2018-10-10 15-00-35

Good to know that the true parameters are expected to be the most likely...

screenshot from 2018-10-10 15-02-17

In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information

paroussisc commented 5 years ago

The MLE is usually consistent, meaning that the parameter estimates tend to the true values as the sample size tends to infinity. Consistency doesn't hold when the information gained per parameter does not increase with the sample size.

Here they discuss inconstent MLEs that can still be part of a likelihood ratio test https://stats.stackexchange.com/questions/116725/example-of-an-inconsistent-maximum-likelihood-estimator.

paroussisc commented 5 years ago

MLE

paroussisc commented 5 years ago

screenshot from 2018-10-23 12-18-58

paroussisc commented 5 years ago

screenshot from 2018-10-23 13-21-20

paroussisc commented 5 years ago

screenshot from 2018-10-23 14-48-57

Where KL divergence is the expectation of the log difference between the probability of data in the original distribution with the approximating distribution (fun example: https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained).

Side note: the link mentions variational autoencoders to approximation distributions - worth looking at for football?

paroussisc commented 5 years ago

screenshot from 2018-10-23 15-02-25

AIC is not consistent - the probability of selecting an overly complex model is non-zero even as the sample tends to infinity.