Embedding based matching

dimidloc commented 2 years ago

Hi, thank you for this library. Perhaps I'm missing something, but I've tried the following:

import holmes_extractor as holmes

holmes_manager = holmes.Manager(model='en_core_web_trf', number_of_workers=1, overall_similarity_threshold=0.2, perform_coreference_resolution=True, embedding_based_matching_on_root_words=True)
holmes_manager.register_search_phrase("someone likes brooklyn")

holmes_manager.start_chatbot_mode_console()

and I expected the sentence I love brooklyn to match, however it didn't.

richardpaulhudson commented 2 years ago

At some point in the past I noted that the accuracy of embedding similarities between pairs of verbs wasn't very good, so I restricted embedding-based matching to certain parts of speech. Sorry about this — this is something that should be at least better documented and that the public API should probably allow you to change, and I'll rectify the problem in the next version.

In the meantime, you can get around the problem with the following workaround (before your code):

from holmes_extractor.lang.en.language_specific_rules import LanguageSpecificSemanticMatchingHelper
LanguageSpecificSemanticMatchingHelper.permissible_embedding_pos.append("VERB")

dimidd commented 2 years ago

Thank you! Will try.

richardpaulhudson / holmes-extractor

Embedding based matching #4