simongray / StatementAnnotator

Custom annotator for Stanford CoreNLP that annotates sentences with the underlying statements contained within them.
4 stars 0 forks source link

Connecting components is buggy #49

Closed simongray closed 8 years ago

simongray commented 8 years ago

It seems like the new method has some issues and I will be reviewing them in this issue.

String example = "I sing songs and write words in English.";
[main] INFO statements.core.StatementFinder - componentSets: [[{Verb: "sing"}, {Verb: "write"}, {DirectObject: "words"}, {IndirectObject: "in English"}, {Subject: "I"}, {DirectObject: "songs"}]]
[main] INFO statements.core.StatementFinder - {Verb: "sing"} is the parent of {Subject: "I"}
[main] INFO statements.core.StatementFinder - {Verb: "sing"} is the parent of {DirectObject: "songs"}
[main] INFO statements.core.StatementFinder - added connected component set: [{Verb: "sing"}, {Subject: "I"}, {DirectObject: "songs"}]
[main] INFO statements.core.StatementFinder - {Verb: "write"} is the parent of {DirectObject: "words"}
[main] INFO statements.core.StatementFinder - {Verb: "write"} is the parent of {Subject: "I"}
[main] INFO statements.core.StatementFinder - added connected component set: [{Verb: "write"}, {DirectObject: "words"}, {Subject: "I"}]
[main] INFO statements.core.StatementFinder - {DirectObject: "words"} is the parent of {IndirectObject: "in English"}
[main] INFO statements.core.StatementFinder - added connected component set: [{DirectObject: "words"}, {IndirectObject: "in English"}]
[main] INFO statements.core.StatementFinder - added connected component set: [{IndirectObject: "in English"}]
[main] INFO statements.core.StatementFinder - added connected component set: [{Subject: "I"}]
[main] INFO statements.core.StatementFinder - added connected component set: [{DirectObject: "songs"}]
[main] INFO statements.core.StatementFinder - connectedComponentSets: [[{DirectObject: "words"}, {IndirectObject: "in English"}], [{IndirectObject: "in English"}], [{Verb: "write"}, {DirectObject: "words"}, {Subject: "I"}], [{Verb: "sing"}, {Subject: "I"}, {DirectObject: "songs"}], [{Subject: "I"}], [{DirectObject: "songs"}]]

What I want to happen is this: keep connecting the component sets if

  1. they are entirely contained in another set
  2. one or their components are connected to one of the components in the other set

This - hopefully - subsumes the broken merging operation I'm currently using, but possibly also the splitting based on duplicates further down.