macss-modeling / General-Questions

A repo to post questions about code, data, etc.
0 stars 0 forks source link

0204 in-class coding #11

Closed chrismaurice0 closed 3 years ago

chrismaurice0 commented 3 years ago

In the code you provided for questions #3 and #4 in 2/04 coding challenge, can you explain why we are using titanic_train and titanic_test in each of these columns? I understand knn_train, but I am confused in knn_test and err_train.

mse_knn <- tibble(k = 1:100,

              knn_train = map(k, ~ class::knn(dplyr::select(titanic_train, -Survived),
                                              test = dplyr::select(titanic_train, -Survived),
                                              cl = titanic_train$Survived, k = .)),

              knn_test = map(k, ~ class::knn(dplyr::select(titanic_train, -Survived),
                                             test = dplyr::select(titanic_test, -Survived),
                                             cl = titanic_train$Survived, k = .)),

              err_train = map_dbl(knn_train, ~ mean(titanic_test$Survived != .)),

              err_test = map_dbl(knn_test, ~ mean(titanic_test$Survived != .)))

Thank you!

pdwaggoner commented 3 years ago

This is my (sloppy?) way of calculating training and testing error. Note in creating knn_train, we pass titanic_train to the test = argument. Meaning, make predictions on the training set. Then, we switch this to titanic_test in the second knn_test. Each of these - making predictions on the training set first and then on the testing set second - are used to calculate error for the training set (first via err_train) and then the error for the testing set (second via err_test).