Closed shantanu778 closed 1 year ago
Hi!
The numbers you see should be showing the LEA metric rather than the CoNLL-12 metric reported in the paper.
You need to run calculate_conll.py
on the produced files to get the CoNLL metrics. Please refer to the Evaluation section of the README.
Thanks. But I encountred another issue when I tried to run calculate_conll.py
====== TOTALS =======
Identification of Mentions: Recall: (17754 / 19764) 89.82% Precision: (17754 / 21181) 83.82% F1: 86.72%
--------------------------------------------------------------------------
Coreference: Recall: (13264 / 15232) 87.07% Precision: (13264 / 16394) 80.9% F1: 83.88%
--------------------------------------------------------------------------
None
Traceback (most recent call last):
File "calculate_conll.py", line 42, in <module>
extract_f1(subprocess.run(part_a + [metric] + part_b, **kwargs)))
File "calculate_conll.py", line 17, in extract_f1
return float(re.search(r"F1:\s*([0-9.]+)%", prev_line).group(1))
AttributeError: 'NoneType' object has no attribute 'group'
Hm. You can try running the evaluation scripts manually. I am using the official scorer from here: https://github.com/conll/reference-coreference-scorers
@vdobrovolskii I am able to solve the issue in calculate_conll.py
you have to pass "capture_output": True
as well in Subprocess.run
kwargs = {"check": True, "text": True, "capture_output": True}
extract_f1(subprocess.run(part_a + [metric] + part_b, **kwargs)))
Now it returns None.
I guess it is related with the difference in python versions... Thanks for posting the solution!
I wanted to replicate your result on Ontonotes 5 with your provided pretrained model, roberta_(e20_2021.05.0201.16)release.pt . But my results in dev and test test set are not the same as mentioned in your paper.
Here is the results,
I used the default config file, config.toml. Can you please tell me why it is not providing the same results?