Open bradleyboehmke opened 8 months ago
@bgreenwell, I thought a lot about our recent discussions and it made me go back and reconsider the layout. Above is a proposed new TOC layout. The middle section doesn't change a whole lot but the first section adds some new content that I think would help set the book apart.
For example, ch 2 would talk about framing and scoping ML problems along with thinking about production concerns. This is where we can mention things around the lifecycle of an ML project (i.e. drift) but we mention that our book does not focus on this topic (we can point to other resources).
Also, notice that I remove the unsupervised section but add in a DL section. This modernizes the book plus, I already have a lot of DL notebooks built out that I can migrate so this is starting from scratch.
What are your thoughts?
Lots to discuss at our next catch-up, but here's some (very) high-level thoughts:
The proposed chapter 2 makes me think about the Microsoft ML checklist, which I really like. Can we try to incorporate and/or align with that? Are there others?
In the interest of any discussion on leakage, I think preprocessing should be introduced and precede data splitting in chapter 3; then point to the latter chapter on pre-processing methods (but this ties in STRONGLY with leakage). Maybe this is where we introduce the leakage framework?
Unsupervised is missing?
I think we need need a special chapter on additional topics up front?. E.g., missing values, collinearity in general, interpretability, variable selection and ranking, "Responsible AI", ...
I don't like the idea of deep learning being separated from the rest, but perhaps it's worthwhile because of it's broader applications, like embeddings, etc.? But same goes for random forests (e.g., isolation forests for anomaly detection) and many other methods. I can be persuaded here, but that's my initial thought.
Fundamentals
Supervised Modeling
Deep Learning