What is MultinomialLogisticLossLayer for, since it doesn't support Backprop

davidparks21 commented 8 years ago

I'm confused about MultinomialLogisticLossLayer, it seems like it's documented as a normal loss layer, and does what I want, but it doesn't have backprop implemented. Is it intended to be used with another layer? Something not referenced in the documentation perhaps?

I'm trying to take an image as input and output a (downsampled) heatmap of that image with higher values in locations where the model matches a desired object. My labels are of shape [100 x N_SAMPLES] where 100 represents a 10x10 heatmap output.

Thanks, David

CorticalComputer commented 8 years ago

Hello David,

Were you ever able to resolve your issue? I'm currently trying to apply Mocha to a problem where I too need to output an X by Y map (which is of the same dimension as the input, since i'm classifying every pixel in the input). Any suggestions on how to do this?

davidparks21 commented 7 years ago

Unfortunately not, I tried using Softmax loss but it performed really poorly in comparison to logistic loss which I use now in Tensorflow. I couldn't get it working in Mocha because of this issue. I don't suppose it'll be hard to fix the issue since the derivative of the logistic loss is just (y-sigmoid(z))x, it should just be a few lines of code, one if you're a neat freak. I was initially under the assumption I didn't understand how it was being used, since it was so unexpected that a loss function would be committed that didn't actually work. So if it does in fact work, I still don't understand how it's implemented, and if it's just incomplete code then that's its own obvious issue.

I really liked the structure and potential ease of extending mocha, but to be a viable framework with a decent community it's going to need some basics like a forum where we can all discuss stuff like this. Posting git issues isn't very effective.

As for me, I'm working on Tensorflow for the moment. I get the impression that for Julia deep learning frameworks MXNet and the Tensorflow wrapper will be the actively maintained projects.

CorticalComputer commented 7 years ago

I'm having problems using Mocha for anything other than standard classification. Unfortunately I've dumped a lot of time into Mocha, and it looks like now I will have to move everything to TenserFlow...

pluskid commented 7 years ago

The softmaxloss layer in Mocha is essentially a combination of softmax layer with MultinomialLogisticLoss layer. So if you are to use MultinomialLogisticLossLayer, then SoftmaxLoss layer can be used directly, with proper backward function implemented. It could handle multi-dimension outputs. See the dim parameter in the doc.

pluskid / Mocha.jl

What is MultinomialLogisticLossLayer for, since it doesn't support Backprop #209