Clarification on Bootstrapping Explanation

owenhdiba commented 7 months ago

This question is regarding your explanation for why bootstrapping decreases variance (page 168 of version July 8 2022). At the bottom of page 168, underneath eqns. (7.2a)-(7.2b) you consider $\tilde{y}^{(b)}(\mathbf{x_\star})$ as a random variable. Could you clarify whether this is considering the training set $\mathcal{T}$ as fixed, and the randomness coming from the random drawing of samples to construct each bootstrapped dataset $\mathcal{T}^{(b)}$? Or is the randomness from drawing new datasets $\mathcal{T}'$ and keeping the indices of bootstrap samples fixed (i.e. ${\mathcal{T}'}^{(b)}$ is always constructed from taking the datapoints with indices $i \in \lbrace i^{(b)}_1, i^{(b)}_2, \ldots, i^{(b)}_n \rbrace$ but ${\mathcal{T}'}^{(b)}$ is random)? Or is it varying $\mathcal{T}$ and varying the indices used to construct the bootstrap datasets?

nikwa commented 7 months ago

Great question! The mean and variance of $\tilde y^{(b)}(\mathbf x^*)$ is the same one as described in Section 4.4 "Bias-variance decomposition", page 80. Thus, the expectation is over the different training datasets drawn from $p(\mathbf x, y)$ (and as you say, keeping the bootstrapped indices fixed). So your letter explanation is more correct.

owenhdiba commented 7 months ago

@nikwa thank you for clearing that up!

uu-sml / sml-book-page

Clarification on Bootstrapping Explanation #88