spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

[Issue]: "My favorite time of the year" in .nouns() response #1096

Open MarketingPip opened 3 months ago

MarketingPip commented 3 months ago

Just doing some playing again and seen this gets tagged as all nouns.

import nlp from "https://esm.sh/compromise"

let doc = nlp("my favorite time of the year")

console.log(doc.nouns().out("array")) ["my favorite time of the year"]

I am sure there are various other sentences that some more rules need to be added for. If you have any ideas for DBs with POS tagged we could throw at this to identify some other patterns etc.

As I am sure we could open issues like this one all day long etc..

spencermountain commented 3 months ago

hey Jared, good catch. The tags are correct, but this is a case of .nouns() getting overly-excited.

image

.nouns() has always been noun-phrasey, and I'm not sure how best to tokenize this phrase, if we were to split it up further. it does seem awkward though, I agree. cheers

MarketingPip commented 3 months ago

@spencermountain maybe split by compound nouns? As they should be grouped together but other nouns not...?

Ps; I think I got a list somewhere to of compound nouns to throw at you from awhile back somewhere too!

MarketingPip commented 2 months ago

@spencermountain - I don't know if best approach to this is writing a rule for this such as is "[#PossessiveNoun] #Adjective" > tag group (0) as possessive determiner.

Which this response from GPT might help write this rule:

In English grammar, possessive pronouns typically do not directly modify adjectives. Instead, they typically modify nouns. For example:

Possessive pronoun modifying a noun: "That is my favorite book."
Adjective modifying a noun: "That is a beautiful book."
However, there are cases where possessive pronouns can indirectly modify adjectives through the noun they are associated with:

"That is my favorite red book."
Here, "my" is a possessive pronoun modifying the noun "book," and "red" is an adjective modifying "book." So indirectly, "my" can influence the adjective "red" by modifying the noun "book."

Then regardless still tokenize all nouns out as single noun or compound nouns. So this doesn't occur just with this phrase, as there were countless phrasing tokenizing as chunks almost.. (Maybe we have to peak into library and see if something is going on...?)

As always too, hope you had an awesome weekend.