keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.61k stars 19.42k forks source link

Wrong implementation of weighted loss and Convolutional2D #594

Closed bhokaal2k closed 9 years ago

bhokaal2k commented 9 years ago

Hi, Thanks a lot for making such a wonderful package. I would like to report that the weighted loss function implementation is wrong. Let us see with an example with categorical_crossentropy

def categorical_crossentropy(y_true, y_pred):
    '''Expects a binary class matrix instead of a vector of scalar classes
    '''
    y_pred = T.clip(y_pred, epsilon, 1.0 - epsilon)
    # scale preds so that the class probas of each sample sum to 1
    y_pred /= y_pred.sum(axis=-1, keepdims=True) 
    cce = T.nnet.categorical_crossentropy(y_pred, y_true)
    return cce

The above loss returns the mean (i.e. a scalar) loss of the samples

def weighted_objective(fn):
    def weighted(y_true, y_pred, weights):
        # it's important that 0 * Inf == 0, not NaN, so I need to mask first
        masked_y_true = y_true[weights.nonzero()[:-1]]
        masked_y_pred = y_pred[weights.nonzero()[:-1]]
        masked_weights = weights[weights.nonzero()]
        obj_output = fn(masked_y_true, masked_y_pred)
        return (masked_weights.flatten() * obj_output.flatten()).mean()
    return weighted

Now, the weighted loss is simply multiplying the weights with the same scalar and then taking the mean, whereas, if should have done per sample weight*loss and then taken the mean

So, the correct implementation should be

def correct_weighted_objective(fn):
    def weighted(y_true, y_pred, weights):
        # it's important that 0 * Inf == 0, not NaN, so I need to mask first
        masked_y_true = y_true[weights.nonzero()[:-1]]
        masked_y_pred = y_pred[weights.nonzero()[:-1]]
        masked_weights = weights[weights.nonzero()]
        masked_y_true *= masked_weights
        obj_output = fn(masked_y_true, masked_y_pred)
        return obj_output
    return weighted

@fchollet please amend the code. FYI the simple correction will only work for cross-entropy style losses only and will not work for something say like Mean square error. Moreover, the convolutional.py layer with border_mode = 'same' breaks the GPU continuity of the data due to cropping that renders theano optimization ineffective. Here is the fix for that as well

X = gpu_contiguous(X)
        conv_out = theano.sandbox.cuda.blas.GpuCorrMM(border_mode=border_mode,\
                subsample=self.subsample)\
                (X, self.W)

The fix only works with GpuCorrMM and not with CUDNN, but if you use CUDNN with the original code it will mess it up internally and you will not get the results because of this discontinuity and you would not even know what is screwing up, at least with CUDNN v3.0 rc

fchollet commented 9 years ago

Please...

loss weighting blah blah

I'm afraid there is no issue to be fixed. Moreover your proposed code is pretty outrageously incorrect (weighting the y_true target array that gets passed to the loss and is later on used to compute e.g. categorical crossentropy...).

border_mode = 'same' blah blah

That's not even syntactically correct code. So, is there an issue? If so, how do we fix it? You seem to have no clue what you are talking about, so it's hard for me to take this seriously.

bhokaal2k commented 9 years ago

@fchollet A little humilty does not hurt, anyways. Here is the convoutional2D code that runs and yours does not when the border mode is set to 'same' and for understanding the reason read the comment properly and understand how your slicing operation is breaking the contiguous blocks to GPU. Try your code with CUDNN v3 on the MNIST code that you have given on your website and see the results yourself, it simply does not train. Then try your code by disabling CUDNN and running with conv_gemm and have at least one of the convolutions with border_mode = 'same' and convince yourself that it takes orders of magnitude more time than the code below.

def get_output(self, train):
        X = self.get_input(train)
        border_mode = self.border_mode
        if border_mode == 'same':
            border_mode = 'full'
#         pdb.set_trace()
        X = gpu_contiguous(X)
        conv_out = theano.sandbox.cuda.blas.GpuCorrMM(border_mode=border_mode,\
                subsample=self.subsample)\
                (X, self.W)
#         conv_out = theano.tensor.nnet.conv.conv2d(X, self.W,
#             border_mode=border_mode, subsample=self.subsample)

        if self.border_mode == 'same':
            shift_x = (self.nb_row - 1) // 2
            shift_y = (self.nb_col - 1) // 2
            conv_out = conv_out[:, :, shift_x:X.shape[2] + shift_x, shift_y:X.shape[3] + shift_y]

        return self.activation(conv_out + self.b.dimshuffle('x', 0, 'x', 'x'))

You are right about the weighed loss thing, it works as it is, I got confused due to a careless perusal of the documentation of theano.tensor.nnet.categorical_crossentropy. My sincere apologies for the same but do check the convolutional thing for your own satisfaction, I have changed the code on my side and it is working, I was only trying to help.

fchollet commented 9 years ago

Let's see. You come here stating things that are plain mistaken (which is fine, I'm willing to take the time to explain in such situations), in fact, mistaken in a way that exposes that you have barely given the code a cursory reading and do not understand what it does (which is annoying: if you want to come into somebody's repo yelling "this is all wrong", at least put some effort into it). And you state these things as if they were self-evident truths, which makes you come across as a huge jerk.

After you do this for problem A, I am much less likely to consider what you have to say about problem B. Surely you understand that.

Regarding border_mode = 'same'... the current implementation has been extensively tested, and not just on our side: Lasagne uses the exact same implementation and has been for a while. Sure, it's possible that there's an issue with cuDNN v3 (subtle bugs crop up all the time). But reading your code snippet makes me think you don't understand anything about what is going on and what you are doing. You are aware that your code is not syntactically correct, right? You are aware that, after correction, you code does not call cuDNN (v3 or not), right?

You may or may not be right about the existence of a problem, but it is just impossible to take you seriously, and your "fix" is basically worthless. I am filing this under "ignore until somebody else independently points out a similar-looking issue".

bhokaal2k commented 9 years ago

@fchollet: Ignoring your derogatory remarks and apparent hostility, I once again request you to read below with an open mind and use the code snippet I have provided to test it against your codebase -

  1. I am aware that my code snippet is SYNTACTICALLY CORRECT, perhaps you should take some time off using theano's basic apis and look under the hood to educate yourself on its advanced usage. Find the usage of GpuCorrMM here and spot "called as GpuCorrMM(subsample=...)(image, filters)" - http://deeplearning.net/software/theano/library/sandbox/cuda/op.html#theano.sandbox.cuda.blas.GpuCorrMM and finally replace your code with the code I have given and watch the training still running just fine.
  2. "Lasagne uses the exact same implementation and has been for a while" - That does not make it correct.
  3. Yes, I am aware that I am not calling cuDNN after the modification, because the fix does not work with cuDNN v3, I have not tested on v2.
  4. On a side note, I have already conceded, and apologized, that your weighted_loss is correct, but the correct_weighted_objective() that I have written is also correct, albeit only for crossentropy style losses. It basically does the same thing, please read the math behind how categorical crossentropy is computed and you will see it yourself. You will find it here - http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
jonilaserson commented 9 years ago

What is the assumed shape of 'weights'? Is it (NUM_SAMPLES, 1) or is it supposed to be the same shape as y_pred?

It appears in the code https://github.com/fchollet/keras/blob/master/keras/models.py#L364 it is supposed to be the same shape as y_pred: self.weights = T.ones_like(self.y_train) But what does that mean? Why would you expect more than one weight per sample (in the case y_pred is not a scalar)?

On Tue, Aug 25, 2015 at 3:34 AM, bhokaal2k notifications@github.com wrote:

Hi, Thanks a lot for making such a wonderful package. I would like to report that the weighted loss function implementation is wrong. Let us see with an example with categorical_crossentropy

def categorical_crossentropy(y_true, y_pred): '''Expects a binary class matrix instead of a vector of scalar classes ''' y_pred = T.clip(y_pred, epsilon, 1.0 - epsilon)

scale preds so that the class probas of each sample sum to 1

y_pred /= y_pred.sum(axis=-1, keepdims=True) cce = T.nnet.categorical_crossentropy(y_pred, y_true) return cce

The above loss returns the mean (i.e. a scalar) loss of the samples

def weighted_objective(fn): def weighted(y_true, y_pred, weights):

it's important that 0 * Inf == 0, not NaN, so I need to mask first

masked_y_true = y_true[weights.nonzero()[:-1]] masked_y_pred = y_pred[weights.nonzero()[:-1]] masked_weights = weights[weights.nonzero()] obj_output = fn(masked_y_true, masked_y_pred) return (masked_weights.flatten() * obj_output.flatten()).mean() return weighted Now, the weighted loss is simply multiplying the weights with the same scalar and then taking the mean, whereas, if should have done per sample weight*loss and then taken the mean

So, the correct implementation should be def correct_weighted_objective(fn): def weighted(y_true, y_pred, weights):

it's important that 0 * Inf == 0, not NaN, so I need to mask first

masked_y_true = y_true[weights.nonzero()[:-1]] masked_y_pred = y_pred[weights.nonzero()[:-1]] masked_weights = weights[weights.nonzero()] masked_y_true *= masked_weights obj_output = fn(masked_y_true, masked_y_pred) return obj_output return weighted

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/594.