richardpaulhudson / holmes-extractor

Information extraction from English and German texts based on predicate logic
MIT License
135 stars 12 forks source link

Issue with the "somebody" keyword #13

Closed riccardopinosio closed 1 year ago

riccardopinosio commented 1 year ago

Hallo,

I am having some issues matching patterns with the "somebody" keyword and the passive form. I have a custom spacy model with entity types PERSON, ROLE. (ROLE is a custom entity type). The following seems to work:

manager.register_search_phrase("somebody elects an ENTITYPERSON as an ENTITYROLE")

Which on the sentence "the board elected John as CEO" correctly matches. However, it does not match on the sentence: "John was elected as CEO". Everything works properly, however, when the pattern is e.g. "somebody elects an ENTITYPERSON", i.e. the passive form is correctly matched. Is this a limitation of the "somebody" approach? At the moment I am circumventing this issue by providing passive search phrases ("an ENTITYPERSON was elected as CEO").

richardpaulhudson commented 1 year ago

The problem here is that in the structure somebody elects an ENTITYPERSON as an ENTITYROLE there is a possible semantic dependency between ENTITYPERSON and ENTITYROLE that is missing in the passive construction. Holmes can deal with a lot of variation in syntactic structure, but especially when using search phrases (rather than topic matching) there are typically cases like this that require using multiple search phrases for a single idea. I normally recommend giving all search phrases for a given idea a common label and then using the label in downstream processing. This is described here (sorry that this is confusingly handled in the "chatbot" section and you are probably not building a chatbot).