wordnik / serapis

Serapis is a sentence identifier and modeling pipeline / built for Wordnik
http://wordnik.com
MIT License
24 stars 7 forks source link

Sentence Annotation #11

Closed maebert closed 8 years ago

maebert commented 8 years ago

We will likely need several different representations of sentences that might be FRDs for various features:

Since computing POS tags is rather CPU intensive, so I'd store POS tags on the sentence attribute of each message, too.

The sentence annotation should take a sentence as a string, and the term as a string:

def annotate(sentence, term):
    ...

and return a dictionary with at least the following:

{
    "s": "A kalyptic culture is typified by peacefulness, tolerance and individualism.",
    "s_clean": "a _TERM_ culture is typified by peacefulness tolerance and individualism"
    "pos_tags": "A/DT _TERM_/JJ culture/NN is/VBZ typified/VBN by/IN peacefulness/NN ,/, tolerance/NN and/CC individualism/NN ./."
    "features" : {
      ...
    }
}
clarecorthell commented 8 years ago

Rm bold/italic. These will be captured with a list of tokens for each at the document level.

maebert commented 8 years ago

@clarecorthell Thanks, updated tis and #12 accordingly

maebert commented 8 years ago

Fixed in #44