Closed lesteve closed 4 years ago
Good points. My 2 cents:
handle_unknown='ignore': explain more the reason: to put 0 in the categories if at test time, a category has not been seen in the train data.
I almost included this in my suggestions. I agree and add that you should mention that OrdinalEncoder
doesn't have a handle_unknown
argument atm.
Question about : pipeline with the scaler does it compute the mean on the training, so you have to explain how the Pipeline works, calls .fit and .transform. You don't have to explain maybe, you can just say the parameters are modified only in the .fit (so not in the .predict)
I was confused about fit
, transform
and fit_transform
in preprocessing functions and thought it was useful to understand this. It was good to learn that fit
doesn't literally mean 'fit' in a preprocessing function, it just calculates the required parameters and saves them as self attributes - the name 'fit' is used for sklearn API purposes. I understand it as; fit
is performed only on the training data and use can both x
and y
, whereas transform
is performed on both training and test data (similar to predict
).
This is not very structured, so feel free to edit, comment, open other issues for bigger chunks of work:
Content
Miscellaneous