paroussisc / stats

Repo to track the material I cover on Wednesdays
MIT License
1 stars 0 forks source link

Core Statistics - Chapter 1 #2

Open paroussisc opened 5 years ago

paroussisc commented 5 years ago

Just an issue to keep track of important results and definitions, and my thoughts on these.

paroussisc commented 5 years ago

Cumulative distribution functions

Continuous F(X) have a uniform distribution on [0,1], which is useful to remember for generating rvs - generate a uniform rv and apply the inverse of the CDF on this number.

paroussisc commented 5 years ago

Linear transformations of normal random vectors

codecogseqn 10

Which seems obvious given the identities in #3 but the point here is that the multivariate normality is retained after transformation. A special case that is that if a is a vector of finite real constants, then

codecogseqn 11

and when a is a vector of all zeros except with one element equal to 1, we are back to the univariate case, so:

If X has a multivariate normal distribution, then the marginal distribution of any X_j is univariate normal (not the case for a multivariate t-distribution, for example). In fact, the marginal density of any subvector of X is multivariate normal.

paroussisc commented 5 years ago

Transformation of random variables CDF:

codecogseqn 13

PDF:

codecogseqn 14

The book, as an example, uses the definition of a multivariate normal to obtain the pdf using the above formula, here we've created an example in #4.

paroussisc commented 5 years ago

The useful elements of the "Moment generating functions" section are the three properties that are listed: clipboard - september 20 2018 11_56 am 1

and these identities are useful when proving the Central Limit Theorem.

paroussisc commented 5 years ago

screenshot from 2018-09-25 09-56-29

See #5 for an example of the CLT in action.

paroussisc commented 5 years ago

screenshot from 2018-09-26 10-20-26

While this is used in the proof of the (weak) law of large numbers, it does have some other uses, mainly when we cannot make any distributional assumptions about the data. It states that a minimum of just 75% of values must lie within two standard deviations of the mean and 89% within three standard deviations.

Generally speaking you can sub in X-mu to get bounds on the variance, given probabilities, or vice versa.

paroussisc commented 5 years ago

screenshot from 2018-09-26 10-30-02

The main use I've seen for this inequality is for deriving certain identities later in the book for the log-likelihood (log is a concave function). It seems that it is used in many fields of mathematics, but another statistical application is in proving that KL divergence is always non-negative (https://math.stackexchange.com/questions/2031062/proof-of-nonnegativity-of-kl-divergence-using-jensens-inequality).

For convex functions, the inequality flips.

paroussisc commented 5 years ago

A sufficient statistic for a parameter provides all the information you need to estimate that parameter, e.g. the sample mean is sufficient for estimates of the true mean - no need to keep track of all individual elements of the sample.