spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

Get .terms() but keep hyphenated strings (similar to .hyphenated() ) #1076

Closed PuneetKohli closed 4 months ago

PuneetKohli commented 6 months ago

Is there a way to achieve this?

spencermountain commented 6 months ago

hey Puneet, good question: Little weird, but you could do .splitAfter('!@hasHyphen'), like this: https://runkit.com/spencermountain/659822ebdfb7e500085838fd

Alternatively, you could shim-in a custom tokenizer, like:

nlp.world().methods.one.tokenize.splitTerms = function (str) {
  return str.split(/ /)
}
nlp('one two-three four five').debug()
// one, two-three, four, five

that one is obviously simplified, but let me know if you'd like some more help. cheers