Closed point6013 closed 4 years ago
Hi there,
we use the train data set's mean and standard deviation to standardized the test data set?
Yes, this is correct.
what if the train and test diverate a lot,
That's a good point. First, we assume that training and test set are sampled from the same population. This is an assumption that underlies almost all of machine learning concepts. However, in practice, the assumption can still be violated. In this case, it becomes even more important to use the training set mean and standard deviation to scale the test set.
Please have a look at this entry here, where I tried to make this a bit more clear: https://sebastianraschka.com/faq/docs/scale-training-test.html
so my question is that : the StandardScaler class use the train test's 𝜇 (sample mean) and 𝜎 (standard deviation), or the whole data sets's 𝜇 and 𝜎
It uses the mean and standard deviation of the dataset that was provided via the fit()
method. E.g., below it would be training set's 𝜇 and 𝜎 because I use sc.fit(X_train)
:
sc = StandardScaler()
sc.fit(X_train)
X_train_scaled = sc.transform(X_train)
X_test_scaled = sc.transform(X_test
dear sir, it confused me a lot . we use the train data set's mean and standard deviation to standardized the test data set? what if the train and test diverate a lot, why don't use the whole data set' s 𝜇 and 𝜎 to accomdize the train and the test? so my question is that : the StandardScaler class use the train test's 𝜇 (sample mean) and 𝜎 (standard deviation), or the whole data sets's 𝜇 and 𝜎