thunlp / ERNIE

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
MIT License
1.41k stars 267 forks source link

Inconsistent number of entity types of the OpenEntity dataset? #79

Closed XingLuxi closed 3 years ago

XingLuxi commented 3 years ago

The number of entity types on the OpenEntity dataset is inconsistent with the reported statistics of Table1 in the ACL paper. We find that the number of examples of OpenEntity is 1,998/1,998/1,998 (train/dev/test), and the number of entity types is 9 (i.e.,['entity', 'event', 'group', 'location', 'object', 'organization', 'person', 'place', 'time'] ). However, the statistics in Table1 of the ACL2019 paper (ERNIE: Enhanced Language Representation with Informative Entities) report OpenEntity with 6 types and train/dev/test size is 2,000/2,000/2,000. What is the difference between the provided downstream datasets (download from the link in the README.md, https://cloud.tsinghua.edu.cn/f/6ec98dbd931b4da9a7f0/) and the datasets used in the paper?

zzy14 commented 3 years ago

Sorry for the inconsistency. We report the wrong statistics in the paper. The provided downstream datasets are consistent with those used in the paper.