mml-book / mml-book.github.io

Companion webpage to the book "Mathematics For Machine Learning"
12.91k stars 2.39k forks source link

Probably incorrect model specification in the Linear Regression #715

Open michaelpoghosyan opened 2 years ago

michaelpoghosyan commented 2 years ago

Description: In the introduction to the Linear Regression chapter the model is given by $y_n = f(x_n) + \epsilon$, with the note that $\epsilon$ is an iid random variable. Usually, we use the term iid for several r.v.s. Of course, one needs to understand that for any $n$, $\epsilon$ will be different, will be another independent copy of the corresponding r.v., but this can be misleading and unclear for a person who is not mathematically mature.

Location:

  1. version 2022-01-11
  2. Chapter 9
  3. page 289 (lines 5,6), page 291 (line 10)
  4. line 5

Proposed solution: One of the possible solutions can be to add an index to $\epsilon$, i.e. to write $\epsilon_n$. Another solution is to write the model in the form $y = f(x)+\epsilon$, where $\epsilon$ is a r.v. that describes the measurement/observation noise ... (without mentioning the iid part).