moos / wordpos

Part-of-speech utilities for node.js based on the WordNet database.
477 stars 41 forks source link

Confusion on Output -- 'is' isn't a verb? #8

Closed SkittishSloth closed 9 years ago

SkittishSloth commented 9 years ago

I know you don't have a ton of control over how the results are generated (i.e. comes from Natural and/or WordNet db), so don't know if this is the right place to talk about this. Basicall, using a very simple input sentence "the ball is red", I get the following output:

{ nouns: [ 'ball', 'red' ],
  verbs: [ 'ball' ],
  adjectives: [ 'red' ],
  adverbs: [],
  rest: [ 'is', 'the' ] }

I did a search on WordNet's site, and it does show "is" as a verb (after a few other entries that seem to be based on acronymns or whatever). Again, not sure if this is on wordpos or Natural or whatever, but figured I'd throw it out for you.

moos commented 9 years ago

"a ton of control" is an huge understatement. I have 0 control! You're quite right and I get the same results. The word "is" is not in the raw WNdb files. If you notice in the results on WordNet, the word "is" isn't actually shown either -- only its relation to other words which I believe they must be doing additional lookup in other sources.

A word of caution, both "is" and "the" are also stopwords which are excluded by default. To stop excluding them, pass {stopwords: false} to the constructor. See the main page for e.g.