mrdbourke / zero-to-mastery-ml

All course materials for the Zero to Mastery Machine Learning and Data Science course.
https://dbourke.link/ZTMmlcourse
2.97k stars 3.44k forks source link

Metric comparison - Colab code - Validation set missing #111

Open simoneroviaro opened 2 weeks ago

simoneroviaro commented 2 weeks ago

The issue refer to the follow-up Google Colab code of the "Note: Metric Comparison Improvement", by the end of the chapter "Scikit-learn: Creating Machine Learning Models".

In the Colab code, both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.

Quote "The most important part is they all use the same data splits created using train_test_split() and np.random.seed(42)".

I initially supposed that this was referring to the fact that during the previous lessons, a validation set was created for RandomizedSearchCV, but it was not consistent with the GridSearchCV, where a 80/20 train_test_split was used instead.

This turned out not to be the case in Colab code. Infact both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.

This is not consistent with the content of the previous lessons where the validation set was exaplained.

Could you please clarify? Thanks, Simone.