spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.42k stars 655 forks source link

Numeric ranges fails to properly assign units #446

Open mwillbanks opened 6 years ago

mwillbanks commented 6 years ago

Overview

When handling ranges of numbers it will find the value but fail to assign the context properly in units. Each item in the range should be assigned with the units of the last number.

Example

const nlp = require('compromise');
const doc = nlp('3 to 5 years old');

console.log(doc.values());

Output

[ { number: 3,
    nice: '3',
    ordinal: '3rd',
    niceOrdinal: '3rd',
    text: 'three',
    textOrdinal: 'third',
    unit: '' },
  { number: 5,
    nice: '5',
    ordinal: '5th',
    niceOrdinal: '5th',
    text: 'five',
    textOrdinal: 'fifth',
    unit: 'years old' } ]

Expected Output

[ { number: 3,
    nice: '3',
    ordinal: '3rd',
    niceOrdinal: '3rd',
    text: 'three',
    textOrdinal: 'third',
    unit: 'years old' },
  { number: 5,
    nice: '5',
    ordinal: '5th',
    niceOrdinal: '5th',
    text: 'five',
    textOrdinal: 'fifth',
    unit: 'years old' } ]
spencermountain commented 6 years ago

hey Mike, yeah you're right. the units-parsing is really poor right now. There's all sorts of situations like the one you mentioned. There's some others over here- https://github.com/nlp-compromise/compromise/issues/423

I'd really like us to improve this though, and all the parts are there to do this, so PR's welcome, if you're eager.