Closed maebert closed 8 years ago
We will likely need several different representations of sentences that might be FRDs for various features:
_TERM_
Since computing POS tags is rather CPU intensive, so I'd store POS tags on the sentence attribute of each message, too.
sentence
The sentence annotation should take a sentence as a string, and the term as a string:
def annotate(sentence, term): ...
and return a dictionary with at least the following:
{ "s": "A kalyptic culture is typified by peacefulness, tolerance and individualism.", "s_clean": "a _TERM_ culture is typified by peacefulness tolerance and individualism" "pos_tags": "A/DT _TERM_/JJ culture/NN is/VBZ typified/VBN by/IN peacefulness/NN ,/, tolerance/NN and/CC individualism/NN ./." "features" : { ... } }
Rm bold/italic. These will be captured with a list of tokens for each at the document level.
@clarecorthell Thanks, updated tis and #12 accordingly
Fixed in #44
We will likely need several different representations of sentences that might be FRDs for various features:
_TERM_
Since computing POS tags is rather CPU intensive, so I'd store POS tags on the
sentence
attribute of each message, too.The sentence annotation should take a sentence as a string, and the term as a string:
and return a dictionary with at least the following: