richardpaulhudson / holmes-extractor

Information extraction from English and German texts based on predicate logic
MIT License
134 stars 12 forks source link

Embedding based matching #4

Closed dimidloc closed 2 years ago

dimidloc commented 2 years ago

Hi, thank you for this library. Perhaps I'm missing something, but I've tried the following:

import holmes_extractor as holmes

holmes_manager = holmes.Manager(model='en_core_web_trf', number_of_workers=1, overall_similarity_threshold=0.2, perform_coreference_resolution=True, embedding_based_matching_on_root_words=True)
holmes_manager.register_search_phrase("someone likes brooklyn")

holmes_manager.start_chatbot_mode_console()

and I expected the sentence I love brooklyn to match, however it didn't.

richardpaulhudson commented 2 years ago

At some point in the past I noted that the accuracy of embedding similarities between pairs of verbs wasn't very good, so I restricted embedding-based matching to certain parts of speech. Sorry about this — this is something that should be at least better documented and that the public API should probably allow you to change, and I'll rectify the problem in the next version.

In the meantime, you can get around the problem with the following workaround (before your code):

from holmes_extractor.lang.en.language_specific_rules import LanguageSpecificSemanticMatchingHelper
LanguageSpecificSemanticMatchingHelper.permissible_embedding_pos.append("VERB")
dimidd commented 2 years ago

Thank you! Will try.