Nam - underfit and overfit

So the true parameter k* is the true number of clusters and we say the model is underfit if we use k-1 blocks and overfit if we use k+1 blocks. Underfit introduces more bias because we are losing valuable information while overfit introduces less bias (and more variance?) by splitting up the nodes more than necessary. Is this correct? Also, we talked about oracle risk and empirical risk, both of which require the use of k*; do we actually know this value? If so, is this the minimizer we got on the graph at the beginning of the talk (the one that modeled number of clusters against the AICc)?

open-connectome-classes / StatConn-Spring-2015-Info

Nam - underfit and overfit #153