Open BeWe11 opened 5 years ago
The difference between using a CRF and not using it are pretty small and the computational cost added (quadratic in the size of the labels) is not always worth it.
That said, adding a CRF decoder would be definitely useful as an option and pretty straightforward.
An example of how to do it using the the CRF package from tf.contrib
is here:
https://github.com/guillaumegenthial/tf_ner/blob/master/models/lstm_crf/main.py#L94
Would you consider contributing it? I can help you out pointing you to the right part of the codebase to modify for doing it, it will be pretty easy.
@w4nderlust @BeWe11 I can help in adding the CRF functionality to the decoder layer.
@w4nderlust I am interested in adding this feature. Can you please guide me towards the right files where we need to make changes?
Thanks!
@w4nderlust I am interested in adding this feature. Can you please guide me towards the right files where we need to make changes?
Thanks!
Add a class CRFTagger in sequence_decoders ?
Add a class CRFTagger in sequence_decoders ?
Yes that would be great. Consider that we are planning to move to TF2 soon, so at the moment you can use the contrib
package, but make sure that you'll be able to port to the addons
package.
Working on it
I think we do not need to calculate loss using _tf.nn.sampled_softmaxloss. In that case we do not have to find a way to get _classweights and _classbiases.
Please let me know if I am wrong.
Also, Should I write a different loss function ?
You should probably be using tf.contrib.crf.crf_log_likelihood
. You can refer to this as a reference for implementation: https://github.com/guillaumegenthial/tf_ner
The current example for Name Entity Extraction is given as the following model definition:
This works ok, but if Ludwig had a CRF-Decoding layer, one could build state-of-the-art Bi-LSTM+CRF models (like https://arxiv.org/pdf/1508.01991.pdf) in a single simple Ludwig model definition. Any chance CRF decoding is coming to Ludwig any time soon?