mml-book / mml-book.github.io

Companion webpage to the book "Mathematics For Machine Learning"
13.11k stars 2.42k forks source link

Chapter 8 statement about iid seems inappropriate #729

Open Chan-Hee opened 2 years ago

Chan-Hee commented 2 years ago

Describe the mistake We assume that set of examples (x_1,y_1) ,...,(x_N,y_N) are "independent and identically distributed" is wrong statements.

Location Please provide the

  1. version (bottom of page) : Draft (2022-01-11)
  2. Chapter : When Models Meet Data
  3. page : 266
  4. line number/equation number : 4th line above from the bottom (We assume that set of examples (x_1,y_1) ,...,(x_N,y_N) are independent and ~~~)

Proposed solution It should be independent only. Otherwise, p(y_n|x_n,\theta) doesn't make sense. The notation p(y_n|x_n,\theta) means that conditional distribution of y is linked with x.(as x changes conditional dist of y changes) Unless all x_1, ..., x_n has same values it should be independent only.

Additional context Add any other context about the problem here.

Chan-Hee commented 2 years ago

If you want to keep it iid, then the (8.16) should be represented as joint likelihood that factorizes into conditional likelihood of y|x and marginal likelihood of x

mpd37 commented 1 year ago

We consider a supervised learning setting. Here, x is not a random variable, but a deterministic input. I think i.i.d. is fine