Open maxfriedrich opened 6 years ago
Update: training works when I set learn_mode='marginal'
, so I'm using that for now. I'd still be interested in why this breaks with learn_mode='join'
, though.
I suspect this happens because you are not using the CRF's specific loss function. After constructing a CRF layer with identifier crf
, specify loss function as crf.loss_function
.
Adding on to this (because I had a similar issue, even without RNNs)...
I found that creating a single model directly from the layers, as opposed to chaining models, resolved this problem for me as well.
I'm still not sure why this works in some cases and not in others, but at least we now have a way of getting things working.
I'm chaining two Keras models so I can re-use part of a it in a third model that is trained separately later on. When I use the CRF layer as the last layer of my combined model, I get the "An operation has
None
for gradient." error.When inspecting the gradients list in Keras's
optimizers.py
, I see that the gradient becomesNone
on the model boundary.If I instead build a single model from the same layers, this doesn't happen. If I use a
TimeDistributed
Softmax layer instead of the CRF as the last layer, this also doesn't happen.Any ideas? Am I doing something wrong when chaining the models or is this a bug in the CRF layer or its loss function?
Here is a script that reproduces the error and shows that it's working fine with a single model as well as with a Softmax layer. https://gist.github.com/maxfriedrich/4d01b23d17ad67b8f7026bec25d51694
My setup: