writer / replaCy

spaCy match and replace, maintaining conjugation
https://pypi.org/project/replacy/
MIT License
34 stars 8 forks source link

Allow custom match_hooks #13

Closed sam-writer closed 4 years ago

sam-writer commented 4 years ago

Currently, the only allowed match_hooks are provided by replaCy, here. Since it's early days, this is good, and if someone thinks of one they want, we may as well add it. However, at some point, users will want to be able to use custom hook functions without waiting for a new replaCy version.

We import the custom_hooks here, and load them here. What gets loaded as custom_patterns is a module. So this line

template = getattr(custom_patterns, hook["name"])

gets the match_hook function (BTW this raises AttributeError if you reference an undefined hook - I think this is the correct behaviour, but, it might have to change slightly).

I think the way we want to handle this is:

  1. In ReplaceMatcher's __init__, we want to add a new parameter, custom_match_hooks, which defaults to [], but can be set to an array of python modules, eg. usage looks like
from replacy import ReplaceMatcher
from replacy.db import load_json
import spacy

import my.custom_hooks as ch
import my.other.custom_hooks as och

nlp = spacy.load("en_core_web_sm")

rmatch_dict = load_json("./resources/match_dict.json")
rmatcher = ReplaceMatcher(nlp, rmatch_dict, custom_match_hooks=[ch, och])
  1. Then we init the new patterns:
    # in __init__
    self.custom_patterns = custom_patterns + custom_match_hooks  # built-in + user supplied
  1. Then something different with this, since we have to look in each custom hook module for any hook we find

  2. Once this is working, part of the ticket is DOCUMENTING how to use custom hooks!