josephlee94 / intuitive-deep-learning

A beginner-friendly tutorial to introduce Deep Learning concepts in an intuitive way!
71 stars 177 forks source link

Question on the MinMaxScaler #2

Open ChihchengHsieh opened 3 years ago

ChihchengHsieh commented 3 years ago

Hello,

I just have a quick question about the MinMaxScaler and dataset splitting process.

Is scaler alloed to see the test set?

Since you fit the scaler before splitting, the fitting process include the test set.

However, I found this Question 1 (b):

https://cs230.stanford.edu/files/cs230exam_fall18_soln.pdf

And this:

https://jamesmccaffrey.wordpress.com/2019/01/04/how-to-normalize-training-and-test-data-for-machine-learning/?fbclid=IwAR01dAH5OIWkiS8SeJ-XDQ5vyBKoGRe1CwJ9J0HiWexb9zBV1xuOWmw_YjU

Indicating that scaler should fit on the training set.

Or both of them are fine?

Thanks in advance

ChihchengHsieh commented 3 years ago

It can be a data leakage problem:

https://machinelearningmastery.com/data-leakage-machine-learning/