Closed WHQ1111 closed 4 years ago
This is a great question. I had the same confusion when I worked on this project. If you try the one with -1.0, you will empirically observe that it performs worse than the one with -2.0.
The implementation is correct, but there is a typo in the equation (11) of the paper. I apologize for the confusion. The correct equation should look like this: Please refer to the derivation of dual ridge regression e.g. in "Ridge Regression Learning Algorithm in Dual Variables" by Saunders et al. (1998).
Thank you very much. I get it.
I have read you code , and I think there may be some wrong about the function 'MetaOptNetHead_Ridge' in classification_heads.py. I think it should be
not
And the equation (11) should be minimize not maximize. Am I right?