Closed sam-writer closed 4 years ago
spacy.util
has a function for this. If a project is using replaCy, they are using spaCy, so it is 2 lines to call this function. I don't think we need to offer it for the user, we can't make it any more convenient than it already is
This is very easy now with custom pipeline components:
import en_core_web_sm
from replacy import ReplaceMatcher
from replacy.db import load_json
from spacy.util import filter_spans
nlp = en_core_web_sm.load()
replaCy = ReplaceMatcher(nlp, load_json('path to match dict(s)'))
replaCy.add_pipe(filter_spans, name="filter_spans", before="joiner")
Though... I think this maybe should be the default behavior
Put the example into our wiki. I think the less we define as default - the less we need to explain at the very beginning - the more accessible is replaCy.
In most of our replaCy-powered apps, we filter spans by containment - eg if there are 3 matches, but
1 subset 2 subset 3
then we only return3
. The logic we use is:Is this something we want to do in replaCy? If so, by default or opt-in?