simongray / StatementAnnotator

Custom annotator for Stanford CoreNLP that annotates sentences with the underlying statements contained within them.
4 stars 0 forks source link

How to deal with informal self-referential language with missing subject? #42

Open simongray opened 8 years ago

simongray commented 8 years ago

Here's an example:

String example = "Recently moved here with my girlfriend and we have found that it is quite manageable.";

with output:

Recently moved here with my girlfriend and we have found that it is quite manageable.
    |_ statement: {Statement: "we have found that it is quite manageable", components: 3}
        |_ component: {Subject: "it"}
        |_ component: {Verb: "have found"}
        |_ component: {Subject: "we"}

It is missing the first statement and has fucked up the other one, but this can be remedied by inserting the implied "I" as the first word the statement before calculating the dependencies. Then you get this (correct) output:

I recently moved here with my girlfriend and we have found that it is quite manageable.
  |_ statement: {Statement: "we have found that it is quite manageable", components: 3}
  |  |_ component: {Subject: "we"}
  |  |_ component: {Verb: "have found"}
  |  |_ component: [{DirectObject: "quite manageable"}, {Subject: "it"}]
  |_ statement: {Statement: "I recently moved here with my girlfriend", components: 3}
     |_ component: {Subject: "I"}
     |_ component: {Verb: "recently moved here"}
     |_ component: {IndirectObject: "with my girlfriend"}

One solution could be to detect if something has been fucked up somehow and then insert a fake "I" and get new statements, but how can this be detected?