spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.47k stars 653 forks source link

Incorrect toSingular for bottles when moving from 11.14.3 to 14.7.1 #982

Closed MichaelLeeHobbs closed 1 year ago

MichaelLeeHobbs commented 1 year ago

Related to #309

// good example nlp('seven swans a swimming').nouns().toSingular().all().text() // seven swan a swimming nlp('swans').nouns().toSingular().all().text() // swan nlp('the bottles').nouns().toSingular().all().text() // the bottle

// bad example nlp('bottles').nouns().toSingular().all().text() // bottles - should be bottle

This broke somewhere between 11.14.3 and 14.7.1. Yes, I was way overdue for a package update.

Workaround

    pluralToSingular(word = '') {
        const results = new Map()
        const handler = (prefix, word) => {
            const result = this.compromise(`${prefix} ${word}`).nouns().toSingular().all().text().replace(`${prefix} `, '')
            if (!results.has(result)) {
                results.set(result, 0)
            }
            results.set(result, results.get(result) + 1)
        }

        handler('', word)
        handler('the', word)
        handler('his', word)
        handler('her', word)
        handler('give', word)

        const sorted = [...results.entries()].sort((a, b) => b[1] - a[1])
        return sorted[0][0]
    }

This works the same as 11.14.3 and still can't handle the edge cases listed in #309. For example: nlp('I gave John the scissors.').nouns().toSingular().all().text() // I gave John the scissor. - Which is incorrect.

On an interesting side note, I noticed the following:

  console.log
    pluralToSingular: his thanks -> thank

  console.log
    pluralToSingular: her thanks -> thanks

Generally, the results are more often correct in a phrase versus a single word.

spencermountain commented 1 year ago

hey Michael, apologies for the delay

yeah - when you enter a ambiguous word, with no neighbours, compromise will do its best to predict it's part of speech. Here it guessed 'bottles' is a verb. If you know what it is, you can co-erce this by doing this:

nlp('bottles').tag('Plural').nouns().toSingular().all().text()

bit verbose, i know. let me respond properly to the other issues once i get some time. cheers

MichaelLeeHobbs commented 1 year ago

That works much better but still fails on these examples: trousers, scissors, pants, shorts, panties

I wonder how many words there are like this? Might just be easy to build an exceptions list that is first checked, before resorting to compromise.

As a test I tried this.compromise(word).tag('Singular').nouns().toSingular().all().text() which works for all the failures but not bottles.

If I understand your code correctly your assuming the word passed in is plural, which may not be the case.

I'm digging deeper into the code base and the pluralToSingular function is only used in one place to set a property on an object but it appears that property is not used in a way where the plurality of the object would ever matter. I'm going to take a deeper look and see if I can just remove this code. If I find it's not needed would you like me to just close this issue?

spencermountain commented 1 year ago

yeah, maybe i'll need to better understand your project to help more.

in early versions of compromise, we had a nlp.noun('trouser').toWhatever() api - where it assumed you had a list of nouns to process. This quickly got messy - as things were often 'chocolate cookies', or 'captain of the football team'. We concluded it was easier to assume all input was natural language we should pull-apart, before processing.

There are a few things you can do

let doc = nlp('walks') // assumed to be a verb
doc.tag('Noun') // tag it as a noun (neither plural or singular)
// this method will do some plural/singular guesswork
if (doc.nouns().isPlural()) {
  doc.tag('Plural')
}
doc.debug()

or if you're frustrated at the compromise api, you can call the inflection methods directly:

let world = nlp.world()
const { toSingular, toPlural } = world.methods.two.transform.noun

console.log(toSingular('cookies', world.model))
console.log(toPlural('cookie', world.model))

cheers

spencermountain commented 1 year ago

lastly, if you run:

nlp('trousers').debug()

you'll see those examples have an #Uncountable tag. This prevents them from becoming singular. You can adjust the uncountable lexicon, just like any other tags