Closed arademaker closed 2 years ago
Those should be the splits used in Conll-2012 shared task, and yes they are split by entire files (as is necessary for coreference shared tasks). I'll yield to @sameer-pradhan regarding how the splits were made: my impression is that the splits were randomly allocated, and then "conll12-test" was created by filtering out files from "test" that didn't have coreference annotations, but it was before my time.
Considering these ontonotes-{dev,test,train}-list.txt files that contain one file name per line, I expanded them to obtain the sentences for each set. But I have found many missing files, all of them inside the wb/sel
collection. Below is the amount of missing files for each directory inside the wb/sel
. Does anyone have any explanation for it?
100 data/ontonotes/wb/sel/87
100 data/ontonotes/wb/sel/80
99 data/ontonotes/wb/sel/89
99 data/ontonotes/wb/sel/81
99 data/ontonotes/wb/sel/45
99 data/ontonotes/wb/sel/41
98 data/ontonotes/wb/sel/66
98 data/ontonotes/wb/sel/65
97 data/ontonotes/wb/sel/74
97 data/ontonotes/wb/sel/44
97 data/ontonotes/wb/sel/40
96 data/ontonotes/wb/sel/88
96 data/ontonotes/wb/sel/79
96 data/ontonotes/wb/sel/63
96 data/ontonotes/wb/sel/52
96 data/ontonotes/wb/sel/51
95 data/ontonotes/wb/sel/71
95 data/ontonotes/wb/sel/69
94 data/ontonotes/wb/sel/73
94 data/ontonotes/wb/sel/70
94 data/ontonotes/wb/sel/62
94 data/ontonotes/wb/sel/39
93 data/ontonotes/wb/sel/77
93 data/ontonotes/wb/sel/61
93 data/ontonotes/wb/sel/58
93 data/ontonotes/wb/sel/56
93 data/ontonotes/wb/sel/49
92 data/ontonotes/wb/sel/78
92 data/ontonotes/wb/sel/75
90 data/ontonotes/wb/sel/72
90 data/ontonotes/wb/sel/57
89 data/ontonotes/wb/sel/64
88 data/ontonotes/wb/sel/82
88 data/ontonotes/wb/sel/37
85 data/ontonotes/wb/sel/43
84 data/ontonotes/wb/sel/94
84 data/ontonotes/wb/sel/83
83 data/ontonotes/wb/sel/67
83 data/ontonotes/wb/sel/53
82 data/ontonotes/wb/sel/76
82 data/ontonotes/wb/sel/60
81 data/ontonotes/wb/sel/38
79 data/ontonotes/wb/sel/97
79 data/ontonotes/wb/sel/68
79 data/ontonotes/wb/sel/42
78 data/ontonotes/wb/sel/86
78 data/ontonotes/wb/sel/48
77 data/ontonotes/wb/sel/84
76 data/ontonotes/wb/sel/46
71 data/ontonotes/wb/sel/36
69 data/ontonotes/wb/sel/95
66 data/ontonotes/wb/sel/50
62 data/ontonotes/wb/sel/26
59 data/ontonotes/wb/sel/96
58 data/ontonotes/wb/sel/59
58 data/ontonotes/wb/sel/35
58 data/ontonotes/wb/sel/33
57 data/ontonotes/wb/sel/92
56 data/ontonotes/wb/sel/93
56 data/ontonotes/wb/sel/90
55 data/ontonotes/wb/sel/23
54 data/ontonotes/wb/sel/47
52 data/ontonotes/wb/sel/34
51 data/ontonotes/wb/sel/31
51 data/ontonotes/wb/sel/27
48 data/ontonotes/wb/sel/32
46 data/ontonotes/wb/sel/91
44 data/ontonotes/wb/sel/28
43 data/ontonotes/wb/sel/25
42 data/ontonotes/wb/sel/54
40 data/ontonotes/wb/sel/85
40 data/ontonotes/wb/sel/30
39 data/ontonotes/wb/sel/22
37 data/ontonotes/wb/sel/24
34 data/ontonotes/wb/sel/29
33 data/ontonotes/wb/sel/98
31 data/ontonotes/wb/sel/55
28 data/ontonotes/wb/sel/09
20 data/ontonotes/wb/sel/11
2 data/ontonotes/wb/sel/05
1 data/ontonotes/wb/sel/18
1 data/ontonotes/wb/sel/10
1 data/ontonotes/wb/sel/04
1 data/ontonotes/wb/sel/03
answered in issue #2
Do we have any information about the files
ontonotes-{dev,test,train}-list.txt
in https://github.com/propbank/propbank-release/tree/master/docs/evaluation ? What criteria were used to make this split? The lines indicate the files, so I am understanding that all sentences in the file should be considered to the corresponding set, right?