where does the StandardScaler class 𝜇 and 𝜎 come from? the train data sets or the whole data sets?

Hi there,

we use the train data set's mean and standard deviation to standardized the test data set?

Yes, this is correct.

what if the train and test diverate a lot,

That's a good point. First, we assume that training and test set are sampled from the same population. This is an assumption that underlies almost all of machine learning concepts. However, in practice, the assumption can still be violated. In this case, it becomes even more important to use the training set mean and standard deviation to scale the test set.

Please have a look at this entry here, where I tried to make this a bit more clear: https://sebastianraschka.com/faq/docs/scale-training-test.html

so my question is that : the StandardScaler class use the train test's 𝜇 (sample mean) and 𝜎 (standard deviation), or the whole data sets's 𝜇 and 𝜎

It uses the mean and standard deviation of the dataset that was provided via the fit() method. E.g., below it would be training set's 𝜇 and 𝜎 because I use sc.fit(X_train):

sc = StandardScaler()
sc.fit(X_train)
X_train_scaled = sc.transform(X_train)
X_test_scaled = sc.transform(X_test

rasbt / python-machine-learning-book-3rd-edition

where does the StandardScaler class 𝜇 and 𝜎 come from? the train data sets or the whole data sets? #135