why your EXPERIMENTS part not given the results of CONLL03 in your paper?

Hou-jing commented 1 year ago

I see you take conll2003 as an example dataset for your paper in the repository but I do not see the results of CONLL03 in your article. I do not understand why I try to use the dataset and I get the following results. So I guess maybe the reason is that the result in CONLL03 is not well Is that it?

![Uploading image.png…]()

Hou-jing commented 1 year ago

tomaarsen commented 1 year ago

@Hou-jing Did you reproduce these results by preprocessing the data with this: https://github.com/microsoft/binder/tree/main/data_preproc#conll-2003 And then running

python run_ner.py conf/conll03json

after installing the dependencies?

I'd like to also reproduce this. Thank you in advance.

tomaarsen commented 1 year ago

Disregard my previous comment, I've managed to reproduce these results. However, I get the following:

***** predict metrics *****
  epoch           =   20.0
  predict_samples =    358
  test_f1         = 0.9327
  test_precision  = 0.9327
  test_recall     = 0.9327

Note in particular also the difference in prediction samples. These results are notably better.

Hou-jing commented 1 year ago

why do you get a better result? can you share your processed dataset?

tomaarsen commented 1 year ago

I either used sharelink1 or sharelink2 from here: https://github.com/juntaoy/biaffine-ner/issues/16#issue-728746824 I don't remember exactly which one, but it contained 3 JSON files that I used. I followed the instructions from the data preprocessing README of this repository.

I don't have access to my desktop right now, so I can't double check which one exactly I used yet I'm afraid.

Hou-jing commented 1 year ago

i also used the link , and i choose link1 then I process the data following the respository

tomaarsen commented 1 year ago

I believe I used sharelink2. At least, the tar contents seems to match what I used.

tomaarsen commented 1 year ago

Note, a reviewer asked a similar question about CoNLL03, which led the authors to publish the official score of BINDER: https://openreview.net/forum?id=9EAQVEINuum&noteId=tzr2SOQADN They claim an F1 of 93.33, which is approximately what I achieved as well.

Hou-jing commented 1 year ago

So there maybe a problem with the dataset I use.

Hou-jing commented 1 year ago

the environment of your code is identical to the repository?

tomaarsen commented 1 year ago

I believe so, yes.

NNroc commented 1 year ago

I encountered the following issue while running the Conll2003 dataset:

RuntimeError: shape '[16, 256]' is invalid for input of size 8192

I found that it seems to be caused by the inability to properly obtain 'token_type_ids' during the operation of the Conll2003 dataset. Do you know how to solve this problem?

NNroc commented 1 year ago

I encountered the following issue while running the Conll2003 dataset:

RuntimeError: shape '[16, 256]' is invalid for input of size 8192

I found that it seems to be caused by the inability to properly obtain 'token_type_ids' during the operation of the Conll2003 dataset. Do you know how to solve this problem?

I have solved the problem I encountered, which I believe is caused by parallel operations on multiple GPUs, setting only one GPU in run_ner.py can avoid this issue, such as adding os.environ["CUDA_VISIBLE_DEVICES"] = "0"

microsoft / binder

why your EXPERIMENTS part not given the results of CONLL03 in your paper? #3