Open matirojasg opened 3 years ago
And the other doubt, I would like to incorporate contextualized embeddings of flair in the model since it has been widely used by the NLP community in the last time. As I don't use TensorFlow, I would like to ask you if you have some recommendations for incorporating these embeddings? Thank you very much!
If you use JSON, you could simply output the pred_ners
which is a list of mentions in the [sentence_id, start_indicie, end_indice, ner_type]
format, so if you output the gold in the same format you can compare them use any metric you proposed.
I am not familiar with flair, but if you could output it the similar way as I did for BERT you can use the system directly and without any modification. Simply change the lm_path
to your flair hdf5 file and lm_size, lm_layers
to the corresponding configuration of the flair.
Thanks. The other doubt I have is that I did not find any method to avoid overfitting in your training code. How could I know how many epochs to use to train a model in my own corpus?
For most of the corpora I train the model by 40k steps (not epochs) but for larger corpora such as ontonotes I train it up to 200k steps
Thank you very much for your work, it is excellent. I am currently calculating new metrics specific to the nested NER task on different models. I would like to include yours in my experiments, but for that I need to generate a file of predictions on the test set. That is, a file showing the entities found by the model in the test file for each sentence and each batch. I would like to know how I could adapt your evaluation code to create this text file, or JSON, maybe with the same format as the inputs.