Closed SwePalm closed 3 years ago
Fair point, even though we (and machine learning practice in general) do not really distinguish between inter- and extrapolation, we just consider everything to be predictions. How do you define inter- and extrapolation for, e.g., image classification in a technical yet sensible way? Anyway, we will take you comment into consideration for future updates.
Just an fyi regarding: "How do you define inter- and extrapolation for, e.g., image classification in a technical yet sensible way?" Balestriero, Pesenti & LeCun actually deal with this question by thinking in terms of convex-hulls:
"Learning in High Dimension Always Amounts to Extrapolation" by
https://arxiv.org/abs/2110.09485
This would add some useful nice insights in a future edition/update of the book.
@andreas-lindholm The figure appears in the chapter 3.1 Linear regression, so my comment is made in that context. I am happy to know you will take my comment into consideration. I do think @MohammedAlJaff found an excellent reference for adding insights to the discussion, in other chapters ;-) Maybe out-of-scope for this book is the topic of a data validation pipeline as a guard for avoiding extrapolation in a production environment https://venturebeat.com/2021/04/04/bounding-your-ml-models-dont-let-your-algorithms-run-wild/
@MohammedAlJaff Thanks! I wasn't aware of that article, but it looks interesting for sure
@SwePalm Sure, I understand. From a pedagogical point of view I'm a bit hesitant to define something for "toy problems" (like figure 3.1) that lacks relevance for more interesting problems; the point of having Figure 3.1 in the book is not because it's interesting in itself (imho, it isn't), but is to give a concrete mental picture of what data, predictions, etc are that can be generalized to more advanced problems that are harder to illustrate. So if we were to include a discussion on inter- vs extrapolation for Figure 3.1, I would like to also follow that up in later chapters.
(And, that venturebeat article talks about preventing extrapolation whereas the arxiv article concludes that "on any high-dimensional (>100) dataset, interpolation almost surely never happens", so I think it's fair to say that the topic hasn't really matured yet, which makes it kind of hard to figure out what to write in a textbook... although it for sure is very relevant!!)
Anyway, right now the manuscript is processed by the publisher and we can only make minor changes, so fact is this won't go into the manuscript right now. But as I said, it's definitely a relevant topic and we will keep it in mind for future revisions!
@andreas-lindholm Fully understand were you are in the process, but if it is a "toy problem" moving the prediction point to be within the range of the training data might not be so strange? I think it could be good the early create a mental model the "the training data used shall represent all data the model will meet in production". I have personally experienced the behaviour of a model that has been exposed to data, not expected. Nowdays i promote good ML Ops practice and have a data quality pipeline before prediction, that would filter out your example as an outlier. :-) Even it the paper above discuss the theory, we can also analyse the results. Explainable AI is another topic i find interesting, and this article links to a new paper https://news.mit.edu/2021/shortcut-artificial-intelligence-1102 (And maybe there is a reason Meta/Facebook announced they shut off their face recognition service...even if there is many other reasons as well) But again, it might be good to seperate the theory of ML with practical use.
I noticed that Figure 3.1 show a classical example of extrapolation, which is not discussed in the text (at this point, I have not the full book yet). I suggest that the figure is updated so the prediction is moved within the range of data used in training.