spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

Is `toNumbers` supposed to mutate data? #1113

Open spencermountain opened 1 month ago

spencermountain commented 1 month ago

Discussed in https://github.com/spencermountain/compromise/discussions/1112

Originally posted by **scpedicini** June 3, 2024 I think I'm doing something wrong with the newer `compromise/three`. `termList`/`out`/etc. works great, but if I use them in conjunction with `toNumbers` the data seems to mutate in place. Trying to do this: ```ts const text_1 = "There are twenty-four apples and 12000 oranges on the table."; const nat = nlp(text_1); const number_of_numerical_values = nat.values().length; // shows a length of two which is what we would expect so far so good const termList_1 = nat.values(0).toNumber().termList(); console.log(termList_1); // termList_1 has converted the number "twenty-four" to "24" but also has // `"apples", "and", "12000", "oranges", "on", "the", "table"` in array // rerun exact same command const termList_2 = nat.values(0).toNumber().termList(); console.log(termList_2); // termList_2 now has the expected array of just "3" // converted to numerical ``` I basically want to go through each numerically recognized "group" hence using the `.values()` command and use the `toNumbers()` command to reduce them down to the numerical number. Picture to help indicate the issue: ![CleanShot 2024-06-03 at 21 55 07@2x](https://github.com/spencermountain/compromise/assets/2040540/225b6821-9033-49d5-a1d6-9d2ccba89d59)
scpedicini commented 1 month ago

@spencermountain Thanks for taking a look. Just wanted to add some information around this as well - near as I can tell it happens pretty consistently with out, json, termList, etc.

const nat = nlp('There were 24 apples and 12000 oranges on the table.');

CleanShot 2024-06-04 at 12 29 44@2x