writer / replaCy

spaCy match and replace, maintaining conjugation
https://pypi.org/project/replacy/
MIT License
34 stars 8 forks source link
nlp spacy

replaCy: match & replace with spaCy

We found that in multiple projects we had duplicate code for using spaCy’s blazing fast matcher to do the same thing: Match-Replace-Grammaticalize. So we wrote replaCy!

spaCy pypi Version Code style: black

Requirements

Installation

pip install replacy

Quick start

from replacy import ReplaceMatcher
from replacy.db import load_json
import spacy

match_dict = load_json('/path/to/your/match/dict.json')
# load nlp spacy model of your choice
nlp = spacy.load("en_core_web_sm")

rmatcher = ReplaceMatcher(nlp, match_dict=match_dict)

# get inflected suggestions
# look up the first suggestion
span = rmatcher("She extracts revenge.")[0]
span._.suggestions
# >>> ['exacts']

Input

ReplaceMatcher accepts both text and spaCy doc.

# text is ok
span = r_matcher("She extracts revenge.")[0]

# doc is ok too
doc = nlp("She extracts revenge.")
span = r_matcher(doc)[0]

match_dict.json format

Here is a minimal match_dict.json:

{
  "extract-revenge": {
    "patterns": [
      {
        "LEMMA": "extract",
        "TEMPLATE_ID": 1
      }
    ],
    "suggestions": [
      [
        {
          "TEXT": "exact",
          "FROM_TEMPLATE_ID": 1
        }
      ]
    ],
    "match_hook": [
      {
        "name": "succeeded_by_phrase",
        "args": "revenge",
        "match_if_predicate_is": true
      }
    ],
    "test": {
      "positive": [
        "And at the same time extract revenge on those he so despises?",
        "Watch as Tampa Bay extracts revenge against his former Los Angeles Rams team."
      ],
      "negative": ["Mother flavours her custards with lemon extract."]
    }
  }
}

For more information how to compose match_dict see our wiki:

Citing

If you use replaCy in your research, please cite with the following BibText


@misc{havens2019replacy,
    title  = {SpaCy match and replace, maintaining conjugation},
    author = {Sam Havens, Aneta Stal, and Manhal Daaboul},
    url    = {https://github.com/Qordobacode/replaCy},
    year   = {2019}
}