Clarifying questions regarding the pset2

macss-modeling / General-Questions

A repo to post questions about code, data, etc.

0 stars 0 forks source link

Clarifying questions regarding the pset2 #10

Closed boseongyun closed 3 years ago

boseongyun commented 3 years ago

Hello,

I am writing to get some clarification on the following questions! I will try to make them concise!

Q2-e It says that we have to evaluate each model using the test set. In this case, does this test set refer to validation set (from the cross-validation) or the test set we split in the beginning of the question?
Q2-f In this question, it says that we have to calculate our final estimate of the test error rate using the test set and also calculate performance metrics using the original test set. I am quite confused about what test sets this question is referring to. Also, could you tell me what specific performance metrics you want us to find?

Thank you for your valuable time!

Yilun0221 commented 3 years ago

Hi!

Q2-e: I think this refers to the test data set you created in Q2-c

Q2-f: I think this goes against 10-fold in cross-validation in Q2-d. In cross-validation we may use different test sets for different rounds. So you are expected to pick out the best model and check its performance on the test data set you created in Q2-c.

boseongyun commented 3 years ago

Thank you so much for your reply!

So... in Q2-D) We fit the classifiers using the 10-fold CV but

in Q3-E) We are going to use the test data created in Q2-C to evaluate each model's performance?

I am sorry if I am not understanding the questions correctly.., but does this mean that we are not going to collect metrics in our CV to find out the best model and go straight down to Q2-e) where we are evaluating the model's performance using the test data (out-of-sample data)?

I though we had to run K-fold CV -> compare the metrics across CVs -> select the best model -> use the model to predict the test data (out-of-sample data).

I apologize if my questions are poorly asked!

pdwaggoner commented 3 years ago

Hi - I can clarify. Sorry. Fit all classifiers using CV. Compare these models in a number of ways (error, accuracy, etc.). Then, use the best to predict as you noted.

ginxzheng commented 3 years ago

Hi Professor, sorry for still hoping some clarification. In the step e "comparing these models", the errors we are comparing should be from the test set, or just the summed errors from the training set?

boseongyun commented 3 years ago

Hi professor, For e) we are comparing the CVs... right?

pdwaggoner commented 3 years ago

You may compare whatever you'd like. In the solutions I created, I show comparisons across three approaches: error, AUC, and then ROC curves. You may compare however you'd like.