Open simongray opened 8 years ago
Here's an example:
// TODO: double subjects, confusion caused by parentheses)
String example = "I sure was (I come from Copenhagen, Denmark).";
creating this output:
I sure was (I come from Copenhagen, Denmark).
|_ statement: {Statement: "I sure was -LRB- I come from Copenhagen, Denmark -RRB-", components: 4}
|_ component: {Subject: "I"}
|_ component: {IndirectObject: "from Copenhagen, Denmark"}
|_ component: {Verb: "sure"}
|_ component: {Subject: "I"}
The dependency graph is not very helpful in this case, unfortunately:
[sure/RB
nsubj>I/PRP
dep>[was/VBD
dep>[come/VBP
punct>-LRB-/-LRB-
nsubj>I/PRP
nmod:from>[Copenhagen/NNP case>from/IN punct>,/, appos>Denmark/NNP]
punct>-RRB-/-RRB-]]
punct>./.]
It seems like I could create a custom annotator after the tokenisation stage which
Perhaps the above is too complex for the time I have left and simply preprocessing to remove parentheses and emoticons is the better solution.
I honestly have no idea, will need to think about it.