Open paroussisc opened 6 years ago
The estimated standard errors of the MLE are the square roots of the diagonal elements of the inverse of the observed Fisher information matrix (which is the negative of the Hessian from many optimisers: https://stats.stackexchange.com/questions/68080/basic-question-about-fisher-information-matrix-and-relationship-to-hessian-and-s).
Good to know that the true parameters are expected to be the most likely...
In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information
The MLE is usually consistent, meaning that the parameter estimates tend to the true values as the sample size tends to infinity. Consistency doesn't hold when the information gained per parameter does not increase with the sample size.
Here they discuss inconstent MLEs that can still be part of a likelihood ratio test https://stats.stackexchange.com/questions/116725/example-of-an-inconsistent-maximum-likelihood-estimator.
MLE
Where KL divergence is the expectation of the log difference between the probability of data in the original distribution with the approximating distribution (fun example: https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained).
Side note: the link mentions variational autoencoders to approximation distributions - worth looking at for football?
AIC is not consistent - the probability of selecting an overly complex model is non-zero even as the sample tends to infinity.
Properties of the expected Log-likelihood:
Where does this proof use the fact that we're evaluating the expectation at the true parameter values?
EDIT: the derivative we're talking about is at the true param value so the expectation needs to be so that the two terms cancel.
This is as you'd expect really - the true parameters should be a turning point of the MLE, so should have derivative zero. It would be weird for maximising/minimising if this wasn't the case!