Closed duchesnay closed 10 years ago
I agree. But I had a discussion with Vincent and Vincent about this, and I was voted down. It seems both definitions are used interchangeably. Note that this applies to the ridge, or L2 penalty as well.
Two possible changes we can do:
Personally I think we should do both. What do you think?
Another possibility is that we add an argument mean=True, that tells us whether or not to compute the mean square loss or just the squared loss.
Thoughts on that?
I have started to add the last option (where mean=True is a default argument to the constructor) to many functions. I'll add them when I need them, so all will have it in time.
Let me know if you don't want us to do it this way, so that I don't spend time on something we won't use.
Parsimony losses are is sum (square) losses. See pylearn-parsimony / parsimony / functions / losses.py
Such losses depends on the number of sample, which make difficult to tune the penalties coefficients. Using "Mean" losses will make life easier when it comes to tune the penalties parameters. Indeed the contribution of the loss in the global objective function will remain the same whatever the size of the dataset.
Such choice have been done in the R "glmnet" package (Friedman, Hastie, Tibshirani) where they minimize:
Consequence: RidgeRegression should only be divided by n RidgeLogisticRegression should use 1/n as default weigths