Closed mikepackard415 closed 3 years ago
Yes, we discussed this just after class on Thursday. Basically, the confusion here is in how most (all?) packages have programmed cost in modal SVM implementations. That is, for greater intuition, the idea is higher cost means fewer cases in the margin, and lower cost means less penalty, and more cases in the margin. This is precisely was we observed when running the code and tuning cost in class. But the book is speaking from a more formal (and correct) approach, of defining cost on the basis of unique \epsilon values, which record the distance of points to their true class margins. Cost in this sense is the total budget that controls how many mistakes are allowed by the classifier. So the ISL text is very much correct, but presents the problem a different way than most software implementations of SVM. I hope this clarifies.
Yes thank you! Much appreciated.
Hi Professor Waggoner, so I believe we're supposed to refer to the ISL text when answering relevant questions (if any) in the final exam, right? Thanks!
If asked, it will be clear (e.g., referring to tuning a hyperparameter when fitting a model vs. a theoretical definition. But in general, yes defer to the text in this type of situation.
Hi there,
In Thursday's lecture notes, on slide 30, the cost hyperparameter C is described: "a cost or penalty to having cases inside the margin, which is, in effect the budget of errors allowed." This description is confusing, because intuitively a cost and budget should be inversely related: If violating the margin imposes a low cost, you can budget for many violations, but if it imposes a high cost, you can only budget for few violations. The mathematical definition aligns closer with C being considered the "budget."
I am further confused by the description of how C controls the bias-variance tradeoff.
In the notes (slide 30):
But in the ISL reading, page 347:
I think what this amounts to is that on the slides, when you say "low cost," this amounts to "large C" and "high cost" amounts to "small C". This seems kind of backwards.
I hope I have explained this confusion well enough! Please let me know if I'm missing something here.