veronica320 / Zeroshot-Event-Extraction

Repository for ACL2021 paper: <Zero-shot Event Extraction via Transfer Learning: Challenges and Insights>.
29 stars 6 forks source link

About the number of samples after data processing #4

Closed gnodgnodtonmi closed 1 year ago

gnodgnodtonmi commented 2 years ago

Hello! Thank you for your excellent work.

I am trying to run the data processing script source/prepreocessing/process_ace.py to process and split the ACE2005 dataset, but according to the file list in the directory data/splits/ACE05-E, the sample numbers of train, dev and test split data I get are 19216, 901 and 676 respectively, which is inconsistent with the sample numbers in your paper (17172, 923 and 832).

I checked many possible details and still don't know what caused this result. Could you please tell me what went wrong? Thanks!

veronica320 commented 2 years ago

Hi, thanks for your interest and thanks for pointing out that issue! You are right, 19216, 901 and 676 are the correct numbers. We are actually using ACE05-E+ (a new version of ACE05 created by Lin et al. (2020) by "adding back the order of relation arguments, pronouns, and multi-token event triggers") in our final experiments. 17172, 923 and 832 are from an older version, ACE05-E, but we haven't updated that in the paper. You can find both versions of data stats from Lin et al. (2020). We'll see if it's possible to update it on aclanthology.