spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

JSON Speed #1081

Closed richfwebb closed 5 months ago

richfwebb commented 5 months ago

I'm getting ~1k words / second, on the .json() step. Is this expected or am I using the library incorrectly?

import nlp from 'compromise',

// fast
const nouns = nlp(doc).nouns()

// slow
console.time('json')
nouns.json() 
console.timeEnd('json')

Why is the json part slow? I haven't seen any benchmark on POS tagging, in case it's related.

spencermountain commented 5 months ago

hey @richfwebb yep - this part is slow, and I'm sure there are things we can do to optimize it. It has to do some analysis on the fly. Two things you can do:

  1. nouns() returns the Noun superclass, which adds some extra analysis. If you don't need this, you can call .json() on a normal class, using nouns.toView().json()
  2. You can turn off any on-the-fly analysis with an options object
    .json({ terms: false, normal:false  }) //etc

    you can see the options here: https://observablehq.com/@spencermountain/compromise-json

cheers

richfwebb commented 5 months ago

Thanks for responding so quick, appreciate the advice! I see now the meaning of .nouns() as a "view" rather than the actual analysis step.