lx865712528 / EMNLP2018-JMEE

This is the code for our EMNLP 2018 paper "Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation"
233 stars 57 forks source link

Train on sentences without containing event #8

Closed mikelkl closed 5 years ago

mikelkl commented 5 years ago

Hi there,

I found that when loading corpus, JMEE use keep_events=1 option to filter out those sentences without containing event, this dramatically decrease the size of training set.

Is this step necessary? Why not keep all the event of training set?

# sentence in train set mush contains at least 1 event
#
train_set = ACE2005Dataset(self.a.train,
                           fields={"words": ("WORDS", WordsField),
                                   "pos-tags": ("POSTAGS", PosTagsField),
                                   "golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
                                   "stanford-colcc": ("ADJM", AdjMatrixField),
                                   "golden-event-mentions": ("LABEL", LabelField),
                                   "all-events": ("EVENT", EventsField),
                                   "all-entities": ("ENTITIES", EntitiesField)},
                           keep_events=1)

# sentence in dev set can have no event
#
dev_set = ACE2005Dataset(self.a.dev,
                         fields={"words": ("WORDS", WordsField),
                                 "pos-tags": ("POSTAGS", PosTagsField),
                                 "golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
                                 "stanford-colcc": ("ADJM", AdjMatrixField),
                                 "golden-event-mentions": ("LABEL", LabelField),
                                 "all-events": ("EVENT", EventsField),
                                 "all-entities": ("ENTITIES", EntitiesField)},
                         keep_events=0)

# sentence in test set can have no event
#
test_set = ACE2005Dataset(self.a.test,
                          fields={"words": ("WORDS", WordsField),
                                  "pos-tags": ("POSTAGS", PosTagsField),
                                  "golden-entity-mentions": ("ENTITYLABELS", EntityLabelsField),
                                  "stanford-colcc": ("ADJM", AdjMatrixField),
                                  "golden-event-mentions": ("LABEL", LabelField),
                                  "all-events": ("EVENT", EventsField),
                                  "all-entities": ("ENTITIES", EntitiesField)},
                          keep_events=0)