expanded ch 3 (statistics) to include more detailed intro to frequentist and bayesian statistics (mostly copied from book1) to make chapter easier to read
improved ch 10 (VI) so that the intro now talks about variational EM, SVI and amortization; then I talk about gradient-based VI (now with pseudocode); then moved CAVI to later.
improved 34.1 (decision theory) to explain connection between bayesian and frequentist methods (added fubini's theorem)
moved the Bayesian learning rule section out of ch 6 (optimization) into online supplement to save space
moved the section on Bayesian learning rule out of ch 3 (statistics) into online supplement to save space
changed symbol for hyper-parametrs to xi, to avoid confusion with infernece network parameters phi
added short sec 3.11 on missing data, moved old section on VAEs for missing data to online