ch. 3. Logistic Regression Classifier using gradient descent

nishamuktewar commented 3 years ago

Thank you for writing this wonderful book. It has allowed me to recollect all my learnings from "The Introduction to Statistical Learning" book plus learn some practical insights.

That said, one of the examples has caused some confusion regarding how the cost or error (or loss) function is computed and then applied to calculate the weights. My understanding is that cost, loss or error functions are essentially one and the same but in this particular case they are likely used differently. Looking at the implementation of the logistic regression algorithm raises a few questions:

Under the "fit()" method the "errors" and "cost" is calculated differently. The "cost" calculation seems correct.
Further, the code depends on "errors" to update the weights - "w_" . why is that?
Now if we were to regularize by adding L2 regularization term (as suggested on Page 74, second edition), yes it will affect the "cost_" but won't shrink the weights under the given implementation

Can you share your thoughts?

rasbt commented 3 years ago

cost or error (or loss) function

Nowadays, they are indeed mostly used synonymously.

Under the "fit()" method the "errors" and "cost" is calculated differently. The "cost" calculation seems correct.

I was using the traditional "terminology" where the "error" is the difference between the label and the prediction.

Further, the code depends on "errors" to update the weights - "w_" . why is that?

Good question, it's not very obvious from looking at it in this code, but that's because that's derivative of the loss (or cost) function. I have summarized it on the slide below:

Screen Shot 2021-02-04 at 6 08 14 PM

Now if we were to regularize by adding L2 regularization term (as suggested on Page 74, second edition), yes it will affect the "cost_" but won't shrink the weights under the given implementation

Yeah, good point. You will have to update the weight update, too. So, instead of

self.w_[1:] += self.eta * (X.T.dot(errors))

it can be changed to

self.w_[1:] += self.eta * (X.T.dot(errors) - self.l2_lambda * self.w_[1:])

nishamuktewar commented 3 years ago

Fantastic, that makes sense! thank you for clarifying.

rasbt / python-machine-learning-book-3rd-edition

ch. 3. Logistic Regression Classifier using gradient descent #147