lesteve / scikit-learn-tutorial

This repo has moved to https://github.com/INRIA/scikit-learn-mooc/
Creative Commons Zero v1.0 Universal
41 stars 22 forks source link

My notes about possible improvements from Euroscipy tutorial #3

Closed lesteve closed 4 years ago

lesteve commented 5 years ago

This is not very structured, so feel free to edit, comment, open other issues for bigger chunks of work:

Content

Miscellaneous

lucyleeow commented 4 years ago

Good points. My 2 cents:

handle_unknown='ignore': explain more the reason: to put 0 in the categories if at test time, a category has not been seen in the train data.

I almost included this in my suggestions. I agree and add that you should mention that OrdinalEncoder doesn't have a handle_unknown argument atm.

Question about : pipeline with the scaler does it compute the mean on the training, so you have to explain how the Pipeline works, calls .fit and .transform. You don't have to explain maybe, you can just say the parameters are modified only in the .fit (so not in the .predict)

I was confused about fit, transform and fit_transform in preprocessing functions and thought it was useful to understand this. It was good to learn that fit doesn't literally mean 'fit' in a preprocessing function, it just calculates the required parameters and saves them as self attributes - the name 'fit' is used for sklearn API purposes. I understand it as; fit is performed only on the training data and use can both x and y, whereas transform is performed on both training and test data (similar to predict).

lesteve commented 4 years ago

Moved to https://github.com/INRIA/scikit-learn-mooc/issues/4.