Closed sam-writer closed 3 years ago
Added a factory method so you don't have to import ESpan, not sure which I like more:
from replacy import ReplaceMatcher
from replacy.db import load_json
import spacy
nlp = spacy.load("en_core_web_sm")
match_dict = load_json('./replacy/resources/match_dict.json')
ematcher = ReplaceMatcher.with_espan(nlp, match_dict=match_dict)
s = ematcher("She extracts revenge.")[0]
thoughts?
The match dict in your test doesn't represent the overlap problem we are trying to solve, this one does: STILL WORKS with this one
{
"make-1": {
"patterns": [
{
"LEMMA": "make"
}
],
"suggestions": [
[
{
"TEXT": "MAKE"
}
]
],
"subcategory": "MAKE_CAPS"
},
"make-2": {
"patterns": [
{
"LEMMA": "make"
}
],
"suggestions": [
[
{
"TEXT": "MaKe"
}
]
],
"subcategory": "MAKE_STYLE",
"comment": "This is a bad match, it is here to demonstrate overlap behavior",
}
}
Tested this in the REPL:
>>> from replacy import ESpan >>> from replacy import ReplaceMatcher >>> from replacy.db import load_json >>> import spacy >>> nlp = spacy.load("en_core_web_sm") >>> match_dict = load_json('./replacy/resources/match_dict.json') >>> ematcher = ReplaceMatcher(nlp, match_dict=match_dict, SpanClass=ESpan) >>> s = ematcher("She extracts revenge.")[0] >>> s extracts >>> s.suggestions ['exacts'] >>> s.match_name 'extract-revenge' >>> doc = nlp("She extracts revenge.") >>> e = ESpan(doc, 1, 2) >>> e extracts >>> e.suggestions [] >>> e.match_name '' >>> e.comment '' >>> e.vector array([ 3.7579389 , 390608 , ... , 0.37205344], dtype=float32) >>> e.kb_id 0 >>> e.vector_norm 25.310467 >>> e._.comment = "yo metaprogramming" >>> e.comment 'yo metaprogramming' >>> e == e._ True
I added the following to the match_dict to check for overlap behavior
{ "make-due": { "patterns": [ { "LEMMA": "make", "TEMPLATE_ID": 1 }, { "LOWER": "due" } ], "suggestions": [ [ { "TEXT": "make", "FROM_TEMPLATE_ID": 1 }, { "TEXT": "do" } ] ] }, "dupe-test": { "patterns": [ { "LEMMA": "make", "TEMPLATE_ID": 1 } ], "suggestions": [ [ { "TEXT": "build", "FROM_TEMPLATE_ID": 1 } ] ], "comment": "This is a bad match, it is here to demonstrate overlap behavior", "test": { "positive": ["I will make something"], "negative": [] } }
this gives
>>> spans = ematcher("I will make due") >>> spans [make, make due] >>> spans[0].match_name 'dupe-test' >>> spans[1].match_name 'make-due' >>> spans[1].suggestions ['make do'] >>> spans[0].suggestions ['build']
@sam-qordoba all good, just one change: added has_extension
implementation
Tested this in the REPL:
I added the following to the match_dict to check for overlap behavior
this gives