macss-modeling / General-Questions

A repo to post questions about code, data, etc.
0 stars 0 forks source link

Clarifying questions regarding the pset2 #10

Closed boseongyun closed 3 years ago

boseongyun commented 3 years ago

Hello,

I am writing to get some clarification on the following questions! I will try to make them concise!

Thank you for your valuable time!

Yilun0221 commented 3 years ago

Hi!

Q2-e: I think this refers to the test data set you created in Q2-c

Q2-f: I think this goes against 10-fold in cross-validation in Q2-d. In cross-validation we may use different test sets for different rounds. So you are expected to pick out the best model and check its performance on the test data set you created in Q2-c.

boseongyun commented 3 years ago

Thank you so much for your reply!

So... in Q2-D) We fit the classifiers using the 10-fold CV but

in Q3-E) We are going to use the test data created in Q2-C to evaluate each model's performance?

I am sorry if I am not understanding the questions correctly.., but does this mean that we are not going to collect metrics in our CV to find out the best model and go straight down to Q2-e) where we are evaluating the model's performance using the test data (out-of-sample data)?

I though we had to run K-fold CV -> compare the metrics across CVs -> select the best model -> use the model to predict the test data (out-of-sample data).

I apologize if my questions are poorly asked!

pdwaggoner commented 3 years ago

Hi - I can clarify. Sorry. Fit all classifiers using CV. Compare these models in a number of ways (error, accuracy, etc.). Then, use the best to predict as you noted.

ginxzheng commented 3 years ago

Hi Professor, sorry for still hoping some clarification. In the step e "comparing these models", the errors we are comparing should be from the test set, or just the summed errors from the training set?

boseongyun commented 3 years ago

Hi professor, For e) we are comparing the CVs... right?

pdwaggoner commented 3 years ago

You may compare whatever you'd like. In the solutions I created, I show comparisons across three approaches: error, AUC, and then ROC curves. You may compare however you'd like.