Open Hou-jing opened 1 year ago
@Hou-jing Did you reproduce these results by preprocessing the data with this: https://github.com/microsoft/binder/tree/main/data_preproc#conll-2003 And then running
python run_ner.py conf/conll03json
after installing the dependencies?
I'd like to also reproduce this. Thank you in advance.
Disregard my previous comment, I've managed to reproduce these results. However, I get the following:
***** predict metrics *****
epoch = 20.0
predict_samples = 358
test_f1 = 0.9327
test_precision = 0.9327
test_recall = 0.9327
Note in particular also the difference in prediction samples. These results are notably better.
why do you get a better result? can you share your processed dataset?
I either used sharelink1 or sharelink2 from here: https://github.com/juntaoy/biaffine-ner/issues/16#issue-728746824 I don't remember exactly which one, but it contained 3 JSON files that I used. I followed the instructions from the data preprocessing README of this repository.
I don't have access to my desktop right now, so I can't double check which one exactly I used yet I'm afraid.
i also used the link , and i choose link1 then I process the data following the respository
I believe I used sharelink2. At least, the tar contents seems to match what I used.
Note, a reviewer asked a similar question about CoNLL03, which led the authors to publish the official score of BINDER: https://openreview.net/forum?id=9EAQVEINuum¬eId=tzr2SOQADN They claim an F1 of 93.33, which is approximately what I achieved as well.
So there maybe a problem with the dataset I use.
the environment of your code is identical to the repository?
I believe so, yes.
I encountered the following issue while running the Conll2003 dataset:
RuntimeError: shape '[16, 256]' is invalid for input of size 8192
I found that it seems to be caused by the inability to properly obtain 'token_type_ids' during the operation of the Conll2003 dataset. Do you know how to solve this problem?
I encountered the following issue while running the Conll2003 dataset:
RuntimeError: shape '[16, 256]' is invalid for input of size 8192
I found that it seems to be caused by the inability to properly obtain 'token_type_ids' during the operation of the Conll2003 dataset. Do you know how to solve this problem?
I have solved the problem I encountered, which I believe is caused by parallel operations on multiple GPUs, setting only one GPU in run_ner.py can avoid this issue, such as adding os.environ["CUDA_VISIBLE_DEVICES"] = "0"
I see you take conll2003 as an example dataset for your paper in the repository but I do not see the results of CONLL03 in your article. I do not understand why I try to use the dataset and I get the following results. So I guess maybe the reason is that the result in CONLL03 is not well Is that it?
![Uploading image.png…]()