moos / wordpos

Part-of-speech utilities for node.js based on the WordNet database.
478 stars 41 forks source link

incorrect results #31

Closed grugknuckle closed 4 years ago

grugknuckle commented 4 years ago

I am running wordpos (v2.0.0) in a Node.js (v12.16.1 LTS) environment.

It seems like this library is returning unexpected results. The code below ...

const WordPOS = require('wordpos')
const wordpos = new WordPOS()

const text = 'The quick brown fox jumped over the lazy dog.'
const results = await wordpos.getPOS(text)
console.log(results)

returns the following (incorrect) result.

{
  nouns: [ 'quick', 'brown', 'fox', 'dog' ],
  verbs: [ 'brown', 'fox', 'dog' ],
  adjectives: [ 'quick', 'brown', 'lazy' ],
  adverbs: [ 'quick' ],
  rest: [ 'The', 'jumped', '' ]
}

I mean, obviously 'quick' and 'brown' are adjectives - not nouns. The verbs array [ 'brown', 'fox', 'dog' ] is filled with one adjective and two nouns and is missing the only verb 'jumped' in the sentence.

Am I missing something, or is there a big problem here?

EDIT: Do i need to tokenize and lemmatize the sentence first?

moos commented 4 years ago

You're missing something! 😜 I don't make the rules, I just report them (per WordNet), and these seem to be correct. Check for yourself: http://wordnetweb.princeton.edu/perl/webwn

EDIT: Do i need to tokenize and lemmatize the sentence first?

No.

PS: You can also run on the command line:

λ wordpos get the quick brown fox jumped over the lazy dog    
# Noun 4:                                                     
quick                                                         
brown                                                         
fox                                                           
dog                                                           

# Adjective 3:                                                
quick                                                         
brown                                                         
lazy                                                          

# Verb 3:                                                     
brown                                                         
fox                                                           
dog                                                           

# Adverb 1:                                                   
quick                                                         
grugknuckle commented 4 years ago

Ok ... after some research, what I've determined is that the getPOS function isn't doing what I thought it was. It is simply returning 4 arrays (Nouns, Adjectives, Verbs, and Adverbs). For each word-token in the string I pass, wordpos simply checks wordnet to see if that toeken can be used as a noun, verb, adjective or adverb. It does NOT tell me how the word is being used in the passed sentence.

For example, the word 'quick' has 8 entries in WordNet, one adverb, one noun and 6 as adjectives.

What I expected to have returned to me is Part of Speech that is actually being used in the sentence. For example, if I passed

'He is trying to fish for fish in the lake.'

I would expect to see the first instance of 'fish' be a verb and the second instance as a noun. But that's not what the getPOS function does. It seems that your code is working as designed, I just didn't understand from your documentation what to expect.

You can close this issue.

moos commented 4 years ago

Ah - thanks for clarifying the confusion. I see how this would be confusing. There is a mention in the readme:

This has no relation to correct grammar of given sentence, where here only 'bear' and 'squirrel' would be considered nouns.

Maybe I'll make that more prominent in the next update. Thanks.