NLP analyzer latency - Githubissues

microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images

MIT License

3.8k stars 571 forks source link

Hi, latency depends heavily on the types of recognizers and NLP models you apply, and there's a latency-accuracy tradeoff. The fastest setup I can think of, is to use the small spaCy model (en_core_web_sm) as the NER model, and remove recognizers that are not needed (the PhoneRecognizer being the slowest I believe. If you expect to have phone numbers only from a certain country, you can also configure it to look only for patterns belonging to this country).

Then, going with heavier spacy models (en_core_web_lg) all the way to transformers (en_core_web_trf or using huggingface) and flair models.

microsoft / presidio

NLP analyzer latency #1094