I found that when loading corpus, JMEE use keep_events=1 option to filter out those sentences without containing event, this dramatically decrease the size of training set.
Is this step necessary? Why not keep all the event of training set?
# sentence in train set mush contains at least 1 event
#
train_set = ACE2005Dataset(self.a.train,
fields={"words": ("WORDS", WordsField),
"pos-tags": ("POSTAGS", PosTagsField),
"golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
"stanford-colcc": ("ADJM", AdjMatrixField),
"golden-event-mentions": ("LABEL", LabelField),
"all-events": ("EVENT", EventsField),
"all-entities": ("ENTITIES", EntitiesField)},
keep_events=1)
# sentence in dev set can have no event
#
dev_set = ACE2005Dataset(self.a.dev,
fields={"words": ("WORDS", WordsField),
"pos-tags": ("POSTAGS", PosTagsField),
"golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
"stanford-colcc": ("ADJM", AdjMatrixField),
"golden-event-mentions": ("LABEL", LabelField),
"all-events": ("EVENT", EventsField),
"all-entities": ("ENTITIES", EntitiesField)},
keep_events=0)
# sentence in test set can have no event
#
test_set = ACE2005Dataset(self.a.test,
fields={"words": ("WORDS", WordsField),
"pos-tags": ("POSTAGS", PosTagsField),
"golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
"stanford-colcc": ("ADJM", AdjMatrixField),
"golden-event-mentions": ("LABEL", LabelField),
"all-events": ("EVENT", EventsField),
"all-entities": ("ENTITIES", EntitiesField)},
keep_events=0)
Hi there,
I found that when loading corpus, JMEE use
keep_events=1
option to filter out those sentences without containing event, this dramatically decrease the size of training set.Is this step necessary? Why not keep all the event of training set?