rasmusbergpalm / DeepLearnToolbox

Matlab/Octave toolbox for deep learning. Includes Deep Belief Nets, Stacked Autoencoders, Convolutional Neural Nets, Convolutional Autoencoders and vanilla Neural Nets. Each method has examples to get you started.
BSD 2-Clause "Simplified" License
3.8k stars 2.28k forks source link

Incorrect output for multiclass predictions #89

Open JasperSnoek opened 10 years ago

JasperSnoek commented 10 years ago

Thanks for putting this codebase together, I think it can be very useful for MATLAB users that want to play around with deep nets. I noticed that multiclass predictions are being passed through multiple logistic functions. See e.g. https://github.com/rasmusbergpalm/DeepLearnToolbox/blob/master/CNN/cnnff.m#L37

This is technically incorrect unless you actually want to be able to predict multiple classes at the same time. Technically, you want to output a multinomial (one of N) distribution rather than N binomial distributions (a multi-class prediction instead of N binary class predictions). What you want to backpropagate through is the Softmax function: http://en.wikipedia.org/wiki/Softmax_function which generalizes the logistic to multiple classes. It normalizes the output distribution such that it's a proper distribution. Backpropagating through this will result in a much better model.

Best,

Jasper

eric-haibin-lin commented 9 years ago

Yes, I also agree on this. Currently it's doing a N binary class prediction. I think this is an important to fix.

tambetm commented 9 years ago

Binary class prediction has its merits as well - if final dataset, that you are going to apply your prediction on, has inputs that don't belong to any class (i.e. letters A, B, C), then using binary logistic units allows you to set threshold and not classify these at all. When using Softmax, all probability must be allocated between classes, even if their scores are very low.

But I find @JasperSnoek's comment interesting, that backpropagating through Softmax results in better model. Why would that be? Would it make sense to train using Softmax, but use binary logistic units at prediction time?