telmo-correa / all-of-statistics

Self-study on Larry Wasserman's "All of Statistics"
974 stars 278 forks source link

Issues with Exercise 9.6.5 #25

Open Pipe-Vash opened 1 year ago

Pipe-Vash commented 1 year ago

In the sections a) and b) (and by extension in the rest of the exercise) there are some issues with the meaning of $\mathbb{E}\left(\overline{X_n^\star}\vert X_1, ... ,X_n\right )$ and $\mathbb{V}\left(\overline{X_n^\star} \vert X_1, ... ,X_n\right)$. As these expressions are conditioned over random variables, the outcomes should also be random variables, not values (as it is expressed in this solution). My solution is the following:

$$\mathbb{E}\left(\overline{X_n^\star}\vert X_1, ... ,X_n\right ) = \frac{1}{n} \sum_i \mathbb{E}\left(X_i^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \sum_i \frac{1}{n}\sum_j X_j = \frac{1}{n} \sum_i \overline{X_n} = \overline{X_n} $$

$$\begin{eqnarray} \mathbb{V}\left(\overline{X_n^\star} \vert X_1, ... ,X_n\right) &=& \frac{1}{n^2} \sum_i \mathbb{V}\left(X_i^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \mathbb{V}\left(X_1^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \left[ \mathbb{E}\left({X_1^\star}^2 \vert X_1, ... ,X_n\right ) - \mathbb{E}^2\left({X_1^\star} \vert X_1, ... ,X_n\right) \right] \ &=& \frac{1}{n} \left[ \frac{1}{n} \sum_i X_i^2 - \overline{X_n}^2 \right]= \frac{1}{n^2} \sum_i \left(X_i - \overline{X_n} \right)^2 = \frac{1}{n} \hat{\sigma} = \frac{n-1}{n^2} Sn^2 \end{eqnarray}$$

As I previously said, these changes produce some effects over the rest of the solution. However, the modifications I propose lead to the same final results of c) and d).

netomenoci commented 1 month ago

First of all, congratulaitons of the nice work putting all of these solutions together. It's a lot of good work!

Agree with @Pipe-Vash .

Conceptually, it's quite important to distinguish between data and random variables. For part B, it is incorrect to say that the conditional variance of sample boostrap mean is 1/n*V(X), since V(X) is a number, and the outcome should instead be a random variable.

Furthermore, as pointed out by @Pipe-Vash , the estimator is biased, which makes the statement even 'more incorrect'.

Here's my solution

image

code below:

Let $X_1, \ldots, X_n$ be distinct observations (no ties). Let $X^_1, \ldots, X^_n$ denote a bootstrap sample and let $\bar{X}^n = n^{-1} \sum{i=1}^{n} X^_i$. Compute $\text{Var}(\bar{X}^*_n \mid X_1, \ldots, X_n)$

\begin{align} V\left[\bar{X}^_n \mid X_1, \ldots, Xn\right] &= V\left[n^{-1} \sum{i=1}^n X_i^ \mid X_1, \ldots, Xn\right] = n^{-2} \sum{i=1}^n V\left[X_i^ \mid X_1, \ldots, X_n\right] \tag{1} \ &= n^{-2} \cdot n \cdot V\left(X_1^ \mid X_1, \ldots, X_n\right) = \frac{1}{n} V\left(X_1^* \mid X_1, \ldots, X_n\right) \tag{2} \end{align}

Where

\begin{align} V\left[X_1^ \mid X_1, \ldots, X_n\right] &= \underbrace{E\left[(X_1^)^2 \mid X_1, \ldots, Xn\right]}{E_1} - \underbrace{\left(E\left[X_1^* \mid X_1, \ldots, Xn\right]\right)^2}{E_2^2} = E_1 - E_2^2. \tag{3} \end{align}

Where

\begin{align} E_2 &= E\left[X_1^ \mid X_1 \ldots, Xn\right] = \sum{i=1}^n P\left(X_1^ = X_i \mid X_1, \ldots, X_n\right) \cdot Xi = \sum{i=1}^n \frac{1}{n} X_i = \bar{X} \tag{4} \end{align}

And

\begin{align} E1 &= \sum{i=1}^n P\left(X_1^* = X_i \mid X_1, \ldots, X_n\right) \cdot Xi^2 = \sum{i=1}^n \frac{1}{n} X_i^2 \tag{5} \end{align}

Finally,

\begin{align} V\left[\bar{X}^*_n \mid X_1, \ldots, X_n\right] &= \frac{1}{n}\left(E_1 - E2^2\right) = \frac{1}{n}\left(\frac{1}{n} \sum{i=1}^n X_i^2 - \bar{X}^2\right) \notag \tag{6} \end{align}