spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

`.prepend()` removes frozen tags for acronyms #1097

Closed Fdawgs closed 3 months ago

Fdawgs commented 3 months ago

Node version: 20.10.0 Compromise version: 14.11.*; 14.12.0

As title states, tags that are previously frozen are removed if the tag is also an #Acronym and is prepended using .prepend().

Note in the below examples that 'audiology' and 'assessment' retains the frozen #Diagnostic tag, whilst "ECG" does not.

Reproduction:

/** @type {import('compromise').default}*/
const nlp = require("compromise");
nlp.plugin({
  frozen: {
    ecg: "Diagnostic",
    "audiology assessment": "Diagnostic",
  },
});

const prependingText = "For the upcoming visit, the patient will need an ";

const result1 = nlp("ECG");
result1.debug();
// Outputs:
//   ┌─────────
//   │ 'ECG'      - Diagnostic

result1.prepend(prependingText);
result1.debug();
// Outputs:
// ┌─────────
// │ 'For'      - Preposition
// │ 'the'      - Determiner
// │ 'upcoming'  - Gerund, Verb, PresentTense
// │ 'visit'    - Noun, Singular
// │ 'the'      - Determiner
// │ 'patient'  - Noun, Singular
// │ 'will'     - Verb, Modal
// │ 'need'     - Verb, PresentTense, Infinitive
// │ 'an'       - Determiner
// │ 'ECG'      - Acronym, Noun <------------------ Should be Diagnostic as frozen

// Non-acronym
const result2 = nlp("audiology assessment");
result2.debug();
// Outputs:
// ┌─────────
// │ 'audiology'  - Diagnostic
// │ 'assessment'  - Diagnostic

result2.prepend(prependingText);
result2.debug();
// Outputs:
// ┌─────────
// │ 'For'      - Preposition
// │ 'the'      - Determiner
// │ 'upcoming'  - Gerund, Verb, PresentTense
// │ 'visit'    - Noun, Singular
// │ 'the'      - Determiner
// │ 'patient'  - Noun, Singular
// │ 'will'     - Verb, Modal
// │ 'need'     - Verb, PresentTense, Infinitive
// │ 'an'       - Determiner
// │ 'audiology'  - Diagnostic
// │ 'assessment'  - Diagnostic
spencermountain commented 3 months ago

ah, interesting! i can take a look, thanks Frazer

spencermountain commented 3 months ago

hey Frazer, good find. This is happening due to the words being naturally un-frozen after the tagger runs. Here, 'ECG' is being secretly tagged a second time, when new words are appended to the sentence.

in 14.13.0 you can now do .compute('frozen'), which will re-freeze any tags:

  nlp.plugin({
    frozen: {
      ecg: 'Frozen',
    },
  })
  let doc = nlp('ECG')
  doc.compute('frozen')
  doc.prepend('For the upcoming visit, the patient will need an ')
  doc.match('ecg').has('#Frozen') // true

cheers