Combined model with CRF layer: "An operation has `None` for gradient"

maxfriedrich commented 6 years ago

I'm chaining two Keras models so I can re-use part of a it in a third model that is trained separately later on. When I use the CRF layer as the last layer of my combined model, I get the "An operation has None for gradient." error.

``` Traceback (most recent call last): File "crf_combined_model.py", line 82, in main model.fit(X, y) File "/Users/max/miniconda3/envs/ma/lib/python3.6/site-packages/keras/engine/training.py", line 1013, in fit self._make_train_function() File "/Users/max/miniconda3/envs/ma/lib/python3.6/site-packages/keras/engine/training.py", line 497, in _make_train_function loss=self.total_loss) File "/Users/max/miniconda3/envs/ma/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/Users/max/miniconda3/envs/ma/lib/python3.6/site-packages/keras/optimizers.py", line 445, in get_updates grads = self.get_gradients(loss, params) File "/Users/max/miniconda3/envs/ma/lib/python3.6/site-packages/keras/optimizers.py", line 80, in get_gradients raise ValueError('An operation has `None` for gradient. ' ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. ```

When inspecting the gradients list in Keras's optimizers.py, I see that the gradient becomes None on the model boundary.

If I instead build a single model from the same layers, this doesn't happen. If I use a TimeDistributed Softmax layer instead of the CRF as the last layer, this also doesn't happen.

Any ideas? Am I doing something wrong when chaining the models or is this a bug in the CRF layer or its loss function?

Here is a script that reproduces the error and shows that it's working fine with a single model as well as with a Softmax layer. https://gist.github.com/maxfriedrich/4d01b23d17ad67b8f7026bec25d51694

My setup:

macOS 10.13.5
Miniconda Python 3.6
Running on CPU
Keras version: 2.2.0
Keras-Contrib version: latest master / 2.0.8

maxfriedrich commented 6 years ago

Update: training works when I set learn_mode='marginal', so I'm using that for now. I'd still be interested in why this breaks with learn_mode='join', though.

johnc1231 commented 6 years ago

I suspect this happens because you are not using the CRF's specific loss function. After constructing a CRF layer with identifier crf, specify loss function as crf.loss_function.

mycal-tucker commented 6 years ago

Adding on to this (because I had a similar issue, even without RNNs)...

I found that creating a single model directly from the layers, as opposed to chaining models, resolved this problem for me as well.

I'm still not sure why this works in some cases and not in others, but at least we now have a way of getting things working.

keras-team / keras-contrib

Combined model with CRF layer: "An operation has `None` for gradient" #271