spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

Using .freeze() in nlp.plugin()? #1080

Closed thegoatherder closed 5 months ago

thegoatherder commented 5 months ago

The documentation for .freeze() shows adding a word to the lexicon and freezing it using addWords().

For projects that use the nlp.plugin() option to initialise the lexicon at program start, an object is passed to the words property with a name-value pair syntax for each tag.

Is there a way to specify these tags to be frozen from within the plugin object?

thegoatherder commented 5 months ago

Can it also be specified in a match-object?

{
  match: "#Diagnostic+ (preassessment|assessment|reassessment|pre-assessment|re-assessment)",
  reason: "",
  tag: "Diagnostic",
}
spencermountain commented 5 months ago

Hey Adam, good idea - I’ll look at adding both of these features over the next few days. Let me know if there’s anything else you can find Cheers

On Thu, Jan 18, 2024 at 11:13 AM Adam @.***> wrote:

Can it also be specified in a match-object?

{ match: "#Diagnostic+ (preassessment|assessment|reassessment|pre-assessment|re-assessment)", reason: "", tag: "Diagnostic",}

— Reply to this email directly, view it on GitHub https://github.com/spencermountain/compromise/issues/1080#issuecomment-1898786945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADBSKM2FOX27XWXOO5FMHDYPFCZFAVCNFSM6AAAAABCAQ6COOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYG44DMOJUGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

thegoatherder commented 5 months ago

Thanks, Spence! Once these are implemented we can fully integrate it into our solution and get it tested thoroughly. We will let you know if we find any anomalies! Looking forward to getting this feature rolling…!

spencermountain commented 5 months ago

hey Adam, both features are implemented on dev, and i will test, document, and release it, hopefully tomorrow. cheers

thegoatherder commented 5 months ago

Hey Spence that’s great! I forgot to mention the reason we need the match-object is for buildNet() and sweep() - will these support frozen tags too?

spencermountain commented 5 months ago

yep - {match:'foo', tag:'Bar', freeze:true, } will lock in #Bar, and it will be unchangeable. Will that do the trick for your purpose?

thegoatherder commented 5 months ago

Yep absolutely perfect, thank you!

spencermountain commented 5 months ago

both features have been released in 14.2.0:

frozen lexicon via plugin:

nlp.plugin({
  // normal lexicon
  words:{
    foo:'Bar'
  },
  // frozen lexicon
  frozen: {
    'juicy fruit': 'Singular',
    'front steps': 'Plural',
  },
})
let doc = nlp(`i ate juicy fruit on the front steps`)
doc.debug()

and freeze inside sweep:

let matches = [
  { match: 'juicy fruit', tag: 'Singular', freeze: true },
  { match: 'front steps', tag: 'Plural', freeze: true },
]
let doc = nlp(`i ate juicy fruit on the front steps`)
let net = nlp.buildNet(matches)
doc.sweep(net)
doc.debug()

note that in both cases, the words don't stay frozen, after this process. You can do doc.sweep(net).freeze() to re-freeze them, for further analysis. cheers

thegoatherder commented 5 months ago

Superb thanks Spence. @Fdawgs is going to integrate this on our end in the coming days and we will report back on any anomalies found during testing. Thanks again

Fdawgs commented 5 months ago

both features have been released in 14.2.0:

Did you mean 14.11.2 @spencermountain?

spencermountain commented 5 months ago

Oops, yes