Closed boseongyun closed 3 years ago
Hi!
Q2-e: I think this refers to the test data set you created in Q2-c
Q2-f: I think this goes against 10-fold in cross-validation in Q2-d. In cross-validation we may use different test sets for different rounds. So you are expected to pick out the best model and check its performance on the test data set you created in Q2-c.
Thank you so much for your reply!
So... in Q2-D) We fit the classifiers using the 10-fold CV but
in Q3-E) We are going to use the test data created in Q2-C to evaluate each model's performance?
I am sorry if I am not understanding the questions correctly.., but does this mean that we are not going to collect metrics in our CV to find out the best model and go straight down to Q2-e) where we are evaluating the model's performance using the test data (out-of-sample data)?
I though we had to run K-fold CV -> compare the metrics across CVs -> select the best model -> use the model to predict the test data (out-of-sample data).
I apologize if my questions are poorly asked!
Hi - I can clarify. Sorry. Fit all classifiers using CV. Compare these models in a number of ways (error, accuracy, etc.). Then, use the best to predict as you noted.
Hi Professor, sorry for still hoping some clarification. In the step e "comparing these models", the errors we are comparing should be from the test set, or just the summed errors from the training set?
Hi professor, For e) we are comparing the CVs... right?
You may compare whatever you'd like. In the solutions I created, I show comparisons across three approaches: error, AUC, and then ROC curves. You may compare however you'd like.
Hello,
I am writing to get some clarification on the following questions! I will try to make them concise!
Q2-e It says that we have to evaluate each model using the test set. In this case, does this test set refer to validation set (from the cross-validation) or the test set we split in the beginning of the question?
Q2-f In this question, it says that we have to calculate our final estimate of the test error rate using the test set and also calculate performance metrics using the original test set. I am quite confused about what test sets this question is referring to. Also, could you tell me what specific performance metrics you want us to find?
Thank you for your valuable time!