spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.42k stars 655 forks source link

Plugin hook methods overriding each other when multiple plugins share the same hook name #1125

Open frederik opened 2 months ago

frederik commented 2 months ago

Thanks for this awesome library!

I am running a simple setup with two plugins. Following the examples on https://observablehq.com/@spencermountain/compromise-constructor-methods I named my hooks 'postProcess'. However, it seems that a later hook overrides the behavior of the first one.

Below example tags { word: [ 'Value', 'TestB' ] } ] where { word: [ 'Value', 'TestA', 'TestB' ] } ] is expected. nlp.world() shows that both hooks are in the array but only the last one is applied.

npl.world output (for version 14.13.0):

hooks: [
    'contractions', 'alias',
    'machine',      'index',
    'id',           'freeze',
    'typeahead',    'lexicon',
    'preTagger',    'contractionTwo',
    'postTagger',   'chunks',
    'postProcess',  'postProcess'
  ]
const nlp = require('compromise');
nlp.verbose(true);
console.log(nlp.version);

const text = 'Word';

nlp.plugin({
    tags: {
        TestA: {
            isA: 'Value',
        }
    },
    compute: {
        postProcess: (doc) => {
            doc.tag('TestA');
        }
    },
    hooks: ['postProcess']
});

nlp.plugin({
    tags: {
        TestB: {
            isA: 'Value',
        }
    },
    compute: {
        postProcess: (doc) => {
            doc.tag('TestB');
        }
    },
    hooks: ['postProcess']
});

const test = nlp(text);
console.log(test.out('tags'), nlp.world())

If this is expected behavior, maybe a mention in the docs would be good. In any case the hooks should probably not be stored twice then.

spencermountain commented 2 months ago

hey Frederik, yeah the hooks are 'stringly-typed' so they must be unique - we just loop through them and run .compute(str) on each one, so you'll have to make the names unique.

I'm curious about your application - why make the multiple plugins? I'm happy to help talk-through it. I also love to see how the plugin stuff is received in the wild. Let me know what you're trying to achieve, if you'd like. cheers

frederik commented 2 months ago

Hi Spencer, thanks for the kind reply.

I'll explain my reasoning. I was doing a POC on identifying parts of a text document (example below). Since these were different problem domains, I called the first one authors and the last one references to detect strings that are likely a list of authors or a list of references. The reasoning behind having two plugins was to be able to test them individually and being able to replace them like that as well.

The person, org detection came in quite handy since I will extract the entities in a second step. It's not a problem for me to name hooks differently, I probably just misunderstood the concept of them.

Firstname Lastname ^1^, Firstname Lastname ^1^, Firstname Lastname ^2*^

1 Organization A
2 Organization B

# Introduction

...

# References

[1] Reference string
[2] Reference string