simongray / StatementAnnotator

Custom annotator for Stanford CoreNLP that annotates sentences with the underlying statements contained within them.
4 stars 0 forks source link

Add support for tenses #60

Open simongray opened 8 years ago

simongray commented 8 years ago

Can be implemented by looking at the part-of-speech tag used for the verb.

Ref: http://stackoverflow.com/questions/22139866/finding-tense-of-a-sentence-using-stanford-nlp

simongray commented 8 years ago

http://www.talkenglish.com/grammar/simple-tense.aspx

I think I should make an initial separation into 3 base tenses: present, future, and past. The corresponding methods would be getTense(), which would return e.g. Tense.present. Perhaps in the future this could be combined with more elaborate tenses, however, that would also make the feature less generalisable across languages, e.g. Chinese.

simongray commented 8 years ago

It seems like tense as a concept is a bit more complex as it must be examined using a few different inputs, namely the POS tag of the verb, as well as the presence of certain auxiliary verbs. I guess it makes sense that tense would be matched on the StatementPattern, rather than on the VerbPattern, although the StatementPattern would of course have to access information in the Verb object.

What needs to be done

simongray commented 8 years ago

Some clear complications arise from how I handle xcomp relations: future tense "going to X" separates into verb + xcomp while "will X" is a single verb compound. Should I make a special rule? Or simple drop future and leave it to patterns?

Edit: actually, no! Because if tense is attached to the statement, it doesn't matter if the xcomp further divides the statement as the relevant part for tense is the Verb object of the root.

simongray commented 8 years ago

To future proof the system, it is obvious that determining tense is entirely language-related as it will involve both grammatical (POS tags) and lexical features ("shall", "will", "to go to") and that these are language-dependent. Chinese is an obvious example a language entirely lacking in tense, but of course it then has different features instead that are somewhat comparable. Since this is a lexico-syntactic approach, both can coexist with no problem, although the implementation of tense (and certain other features) will have to be made language-dependent.

simongray commented 8 years ago

Hm... maybe I should just implement every possible tense correctly from the beginning: https://www.ego4u.com/en/cram-up/grammar/tenses