entity extraction question

Wow, First, thank you for trying out my library, and Second, sorry that I didn't see your issue until now.

Here's the basic issue. wild card patterns with * are very greedy, and actually are ambiguous from a human interpretation standpoint.

So what to do you do? I think what you want to do is to make 2 recognition calls to Lucy ., where the 2nd call is the wild card recognition of the first to evaluate if there is additional matches.

entities:
  - name: '@whatSpeaker'
    patterns:
      - what did (the)? (speaker:___)+3 think (of|about) (topic:___)+*

Sample statement:

what did jon smith think of topic xyz when discussing topic pdq

First call returns the

speaker = jon smith

topic = topic xyz when discussing topic pdq

==== whatSpeaker (3)
what did jon smith think of topic xyz when discussing topic pdq
     ^_______^                                              @speaker
                        ^_________________________________^ @topic
^_____________________________________________________________^ @whatSpeaker

@whatSpeaker [0,63] @speaker,@topic
=>  @speaker [9,18] 'jon smith' Resolution:"jon smith"
=>  @topic [28,63] 'topic xyz when discussing topic pdq' Resolution:"topic xyz when discussing topic pdq"

Now the trick is you want to further interpret the open ended wildcard entity @topic, so you run it through a different Lucy model to further disambigiouate.

  - name: '@subtopics'
    patterns:
      - (x:___)* when discussing (y:___)+*

giving

==== subtopics (3)
topic xyz when discussing topic pdq
^_______^                           @x
                          ^_______^ @y
^_________________________________^ @subtopics

 @subtopics [0,35] @y,@x
    =>  @x [0,9] 'topic xyz' Resolution:"topic xyz"
    =>  @y [26,35] 'topic pdq' Resolution:"topic pdq"

In short, here's the rule of thumb

You probably should only be modeling one open ended wildcard at time. If you need further disambiguation then run it through a separate model.

tomlm / lucy

entity extraction question #2