Open Dimiftb opened 3 years ago
You should make sure the first column and second column of your data are tokens and labels, respectively. Based on the sample from https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO/blob/master/onto.test.ner.sample , the results put the label in the last column. You can also change the following code https://github.com/yhcc/BARTNER/blob/5d562fde9ff4dfe5cd8df9e2b30a3d0fb7ae5917/data/pipe.py#L249
to super().__init__(headers=headers, indexes=[0, -1])
, if you do not like to change your data file. The means the loader will regard the last column as the label column.
Hi @yhcc,
Thank you very much for your reply. This easily fixed the issue. I managed to train the model, however I was wondering how can I display metrics (F1, recall, precision) on the test set?
This is the current output that I have once execution has finished:
We follow previous paper merge the dev and train sets as the train set. Therefore, for the conll2003 dataset, the dev metric is the final test metric.
Hi @yhcc,
Thanks for your reply. How can I go about merging the train and the dev sets? Is there functionality for it already? Also how do I get the metric to display?
Thank you very much for helping me thus far
The merging will happend in https://github.com/yhcc/BARTNER/blob/a42c3bb84f2bec09e02b30f26beae9a2b4d0b868/train.py#L220
The metric will display once you train several epochs (15 epochs for conll2003). We set this because based on our experiments, the best performance will only occur after this epoch, for the sake of saving evaluation time, the code only evaluates after this epoch. You can change thi behavior by change https://github.com/yhcc/BARTNER/blob/a42c3bb84f2bec09e02b30f26beae9a2b4d0b868/train.py#L49 to 1
Hi,
Thank you very much for your paper and your models. I'm attempting to replicate the experimental results in your paper on conll2003 and en-ontonotes. I'm currently faced with an error for both datasets, which I'm not sure how to go about solving. You can see the output of running
python train.py
belowClick to expand
``` 2021-07-08 14:43:47.895031: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "BARTNER/train.py", line 131, inI'm running on colab.
As for conll2003, I've simply extracted the original files for English and have put them in a folder
data/conll2003
as per your instructions.As for ontonotes, to generate bio tags I've followed this repo: https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO and put the files in
data/en-ontonotes/english/
as per instructions.Currently in the folder I've got
onto.development.ner
,onto.train.ner
,onto.test.ner
as you can see on image below:Could you please advise what am I doing wrong? Thanks.