simongray / StatementAnnotator

Custom annotator for Stanford CoreNLP that annotates sentences with the underlying statements contained within them.
4 stars 0 forks source link

Ignore components found within a certain scope #37

Closed simongray closed 8 years ago

simongray commented 8 years ago

Currently, this example does not produce the desired output:

String example = "We also got two Xiaomi air purifiers that work quite well";

... which should be:

[
    {Subject: "we"},
    {Verb: "also got"},
    {DirectObject: "two Xiaomi air purifiers that work quite well"}
]

This is due to components with roots inside an acl relation body! The acl relation modifies a noun and any paths going from it should not be considered separate components. In this case, "that work quite well" is the acl body and "that" is being picked up as a subject and "work" is being picked up as a verb, when both should really be ignored.

It would make sense to generalise this in some way inside an AbstractFinder class and keep track of the relations in Relations class, so that all finder classes (SubjectFinder, etc.) can have a common method that returns a list of ignored relations (similar to the ones in the component classes). These ignored relations are then considered when finding components so that components within scopes created by the stated relations are completely ignored.

simongray commented 8 years ago

Perhaps the simplest solution is to just use findCompoundComponents() on the dep() end of an acl relation right in the beginning of the finding process and then create a set of ignored IndexedWords. Whenever the find() method runs into one of the ignored IndexedWords, it simply... does not consider it as a component.