How to implement my own loss function?

henry0312 commented 8 years ago

I'm going to implement RankNet, and I find I'll need my own loss function (cf. eq. (1) in the paper).

The loss function takes two arguments (not either y_true or y_pred, http://keras.io/objectives/) and returns one output. How do I implement it? Can I do it in the first place?

References

Burges, Chris, et al. "Learning to rank using gradient descent." Proceedings of the 22nd international conference on Machine learning. ACM, 2005.
http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf
(cf. eq (1))
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11 (2010): 23-581.
http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf
(cf. section 2)

fchollet commented 8 years ago

If you want to use a loss function that is not of the form of f(x_true, x_pred), then you have to implement your training routine outside of Keras.

Basically:

1) define your model (typically using the functional API) 2) define your custom cost 3) instantiate an optimizer, get weights updates via: updates = optimizer.get_updates(model.trainable_weights, model.constraints, cost)) 4) take care manually of regularizers and batchnorm updates 5) create your own Keras functions based on the inputs, outputs, and updates

henry0312 commented 8 years ago

thank you for your quick reply. I'll try (3), (4) and (5).

henry0312 commented 8 years ago

3) instantiate an optimizer, get weights updates via: updates = optimizer.get_updates(model.trainable_weights, model.constraints, cost))

5) create your own Keras functions based on the inputs, outputs, and updates

I found that I should do like https://github.com/fchollet/keras/blob/master/keras/engine/training.py#L649-670.

4) take care manually of regularizers and batchnorm updates

I can't understand what do you mean. Please give me some examples.

fchollet commented 8 years ago

If you don't have regularizers or batchnorm layers you can ignore this. Otherwise, you need to:

apply your regularizers to your loss function. You can refer to the contents of the compile method for details.
add model.updates to your optimizer-generated updates. Again, compile covers everything you need to know.

henry0312 commented 8 years ago

Thank you! I'll check compile

yluo42 commented 8 years ago

@henry0312 Have you figured it out? I'm currently facing the same problem and haven't found out a way to implement a cost function that takes multiple arguments (with different shapes) as inputs. Could you please provide an example about how you did that?

henry0312 commented 8 years ago

@ScartleRoy No, it's difficult for me to achieve this with fit or fit_generator 😔 I'll try more.

yluo42 commented 8 years ago

@henry0312 Maybe we need some help from additional official documents. I think there should be many people that also have needs to design cost functions that are different from the default ones in Keras. @fchollet Sorry to disturb you again but would it be possible for Keras to provide some documentations about how to write custom loss functions with different forms?

anewlearner commented 7 years ago

Any update to this?

indraforyou commented 7 years ago

(Copying the answer which I posted on Stackoverflow) http://stackoverflow.com/questions/33859864/how-to-create-custom-objective-function-in-keras/40622302#40622302

Here is my small snippet to write new loss functions and test them before using: `.

import numpy as np
from keras import objectives
from keras import backend as K

_EPSILON = K.epsilon()

def _loss_tensor(y_true, y_pred):
    y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
    out = -(y_true * K.log(y_pred) + (1.0 - y_true) * K.log(1.0 - y_pred))
    return K.mean(out, axis=-1)

def _loss_np(y_true, y_pred):
    y_pred = np.clip(y_pred, _EPSILON, 1.0-_EPSILON)
    out = -(y_true * np.log(y_pred) + (1.0 - y_true) * np.log(1.0 - y_pred))
    return np.mean(out, axis=-1)

def check_loss(_shape):
    if _shape == '2d':
        shape = (6, 7)
    elif _shape == '3d':
        shape = (5, 6, 7)
    elif _shape == '4d':
        shape = (8, 5, 6, 7)
    elif _shape == '5d':
        shape = (9, 8, 5, 6, 7)

    y_a = np.random.random(shape)
    y_b = np.random.random(shape)

    out1 = K.eval(_loss_tensor(K.variable(y_a), K.variable(y_b)))
    out2 = _loss_np(y_a, y_b)

    assert out1.shape == out2.shape
    assert out1.shape == shape[:-1]
    print np.linalg.norm(out1)
    print np.linalg.norm(out2)
    print np.linalg.norm(out1-out2)

def test_loss():
    shape_list = ['2d', '3d', '4d', '5d']
    for _shape in shape_list:
        check_loss(_shape)
        print '======================'

if __name__ == '__main__':
    test_loss()`

Here as you can see I am testing the binary_crossentropy loss, and have 2 separate losses defined, one numpy version (_loss_np) another tensor version (_loss_tensor) [Note: if you just use the keras functions then it will work with both Theano and Tensorflow... but if you are depending on one of them you can also reference them by K.theano.tensor.function or K.tf.function]

Later I am comparing the output shapes and the L2 norm of the outputs (which should be almost equal) and the L2 norm of the difference (which should be towards 0)

Once you are satisfied that your loss function is working properly you can use it as: model.compile(loss=_loss_tensor, optimizer=sgd)

janiteja commented 7 years ago

@indraforyou thanks for the snippet. Your function gives loss values, but how can we specify the gradients from custom loss function for backpropagation?

Edit: I got the answer from keras-users group. Thanks "Klemen Grm" https://groups.google.com/forum/#!searchin/keras-users/loss$20gradients|sort:relevance/keras-users/9KHTdpQ_Rno/0p3tH_-FEgAJ

If you look at the source file for builtin objective functions ( https://github.com/fchollet/keras/blob/master/keras/objectives.py ), notice they're all implemented as Theano functions, which enables automatic gradient calculation. This must also be the case for any custom objective function you implement yourself.

indraforyou commented 7 years ago

@janiteja : Yes that one of the benefit of using Theano/Tensorflow and libraries build on top of them. They can give you automatic gradient calculation of the mathematical functions and operations.

Keras gets them by calling: `

# keras/theano_backend.py
def gradients(loss, variables):
    return T.grad(loss, variables)

# keras/tensorflow_backend.py
def gradients(loss, variables):
    '''Returns the gradients of `variables` (list of tensor variables)
    with regard to `loss`.
    '''
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)`

which are in turn called by the optimizers(keras/optimizers.py) to get the update rule for the tensor graph.

The only time you need to write new gradient is when you are defining a new basic mathematical operation/function .. you can see the below links for that: http://deeplearning.net/software/theano/extending/extending_theano.html https://www.tensorflow.org/versions/r0.12/how_tos/adding_an_op/index.html

htso commented 7 years ago

@henry0312 Have you figured out how to code this? If you have, can you pls post some code snippet. I have similar need for a custom loss that takes three inputs.

patyork commented 7 years ago

There is an example that covers multiple additional arguments in image_ocr.py; link to a relevant part of it (actual loss function is defined here though)

airalcorn2 commented 7 years ago

For anyone else who arrives here by searching for "keras ranknet", you don't need to use a custom loss function to implement RankNet in Keras. The cost function as described in the paper is simply the binary cross entropy where the predicted probability is the probability that the more relevant document will be ranked higher than the less relevant document. The "trick" for implementing RankNet in Keras is making the input to the final sigmoid layer (which generates the predicted probability) the difference between the scores of the two documents (scores that are generated by the same net). My (slightly modified) Keras implementation of RankNet can be found here.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

xiaoleihuang commented 7 years ago

I read through the available loss functions in Keras: https://github.com/fchollet/keras/blob/master/keras/losses.py But I am not sure how to extend the loss function, for example adding a regularization or adding more element?

I checked the model compile definition as below: https://github.com/fchollet/keras/blob/master/keras/models.py#L742 But now sure how the keras handle the loss function....

Could anyone provide further references? Here is what I am thinking of def myloss(y_true, y_pred, weights): return K.mean(K.square(y_pred - y_true), axis=-1) + K.sum(l2 * K.square(W))

ghost commented 7 years ago

@xiaoleihuang Did you find a solution for your custom loss implementation. I am trying to implement a similar custom loss and I am not sure how to implement it.

ritalaezza commented 7 years ago

Hi all, I have been reading through this and other similar issues and still haven't been able to implement my custom loss function.

What I have is a multilabel problem, with 4 input time series, and 7 possible labels at each time step. To attempt to solve the problem, I stacked a couple of LSTM layers followed by a TimeDistributed(Dense) layer, so that there is a classification for each time step. Input dimensions: (timesteps=200,features=4) Outputs dimensions: (timesteps=200,n_outputs=7)

I want to implement the loss function used in this article, where the loss is a convex combination of the final loss (time step = 200) and the average of the losses over all steps. I've tried quite a few approaches, but none have worked:

Essentially I would like something like this to work:

def custom_loss(y_true, y_pred):
    alpha = 0.1
    loss1 = K.sum(C.binary_cross_entropy(y_pred,y_true))/200
    loss2 = C.binary_cross_entropy(y_pred[200,:],y_true[200,:])
    loss = alpha*loss1 + (1-alpha)*loss2
    return loss

I appreciate any help I can get.

eggie5 commented 6 years ago

Like @airalcorn2 noted above RankNet can be implemented w/ vanilla binary cross entropy in Keras. See my example: http://www.eggie5.com/130-learning-to-rank-siamese-network-pairwise-data

michelleowen commented 6 years ago

@indraforyou @fchollet in the loss, you take K.mean on axis=-1. This mean is taken across what?

SuperKam91 commented 5 years ago

@indraforyou @fchollet in the loss, you take K.mean on axis=-1. This mean is taken across what?

It means you sum over the last axis

FirminSun commented 5 years ago

@xiaoleihuang Did you solve your issue? I have the same issue with you.

Daniel7077 commented 5 years ago

@indraforyou thanks for the snippet. what about losses.categorical_crossentropy in order to do the multiple classification rather than binary crossentropy. Thanks.

edoardogiacomello commented 5 years ago

This issue is the only thing that is keeping me from switching from low-level tensorflow to keras. But I see that upcoming tf versions will move to keras anyway, so I'm looking for a way to implement complex models without keras assuming I'm doing plain classification (like wgan, or Gans that uses other networks in their loss, etc).

I found this article that could be helpful to some of you: https://towardsdatascience.com/advanced-keras-constructing-complex-custom-losses-and-metrics-c07ca130a618

Basically you wrap your loss(true, pred) function in a function with an arbitrary number of parameters (which I suppose they may be tensors or whatever you want) and return a loss with the required signature.

I can't understand why keras focuses their api only on "classical" classification problems: Just think of a WGAN discriminator which has an unbounded output and you try to maximize that. In the standard gan formulation the target would be either zero or one, but what would be the target in the wgan case?

I'm sure that there will be a workaround, but I cannot see why complicating things only to have a train "black box" function with a simple signature

keras-team / keras

How to implement my own loss function? #2662

References