Open gagaein opened 3 years ago
I also found this issue when running the code. Based on my observation, this is because the annotation quality of Twitter-2015 is not very high, and there are many entities starting with the 'I-PER' or 'I-LOC' tag. For example, line 2257 Seuss is labeled as 'I-PER', but its preceeding token is labeled as 'O'. In the evaluation script, these entities are also counted as entities; but in the paper only those entities starting with the 'B-type' tag are counted as entities. In contrast, the annotation quality of Twitter-2017 is relatively higher, and it does not have this issue.
Thank you for your reply! I see the annotation trouble of Twitter-2015. That's really strange :( I will try to use your evaluation script to get the right performance scores. Thanks again for taking your valuable time!
First, thank you for your excellent work! When I run your mode on Twitter2015, I noticed the eval result is below: precision recall f1-score support
Please attend to the support column, num of entites does not match the description of dataset Twitter2015. For instance, here the num of PER entites is 1873 in dev set, while description of dataset Twitter2015 says the num of PER entites in dev set is 1816. I cannot understand why there can be more entities reported in eval result. And I sincerely ask for your help. Thanks Again :)