yubochen / NBTNGMA4ED

49 stars 16 forks source link

Can you tell me how to split the dataset? #3

Open unikcc opened 5 years ago

unikcc commented 5 years ago

This is a closed issue

DorianKodelja commented 5 years ago

Hi @nkhouse, I've uploaded the reference split. It does matter since the test set only contains nw documents while the other sets contains documents from every document type. train.txt valid.txt test.txt

xixy commented 3 years ago

Hi @nkhouse, I've uploaded the reference split. It does matter since the test set only contains nw documents while the other sets contains documents from every document type. train.txt valid.txt test.txt

Just out of curiosity, does everyone use such split for ACE 2005 in their experiment? I find Liu Zhiyuan's group use a different split (https://github.com/thunlp/HMEAE/issues/5). When I use their split for ED, the performance is higher than using yours split (I'm not saying your split is incorrect. The different split phenomenon is confusing me.). It seems that researchers all claim in their papers that they use the same split as Grishman and Li (2013), but the data split are different to some extent. For example, the dev set may be randomly selected and comparison may be unfair. Thanks. @DorianKodelja