microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images
https://microsoft.github.io/presidio
MIT License
3.88k stars 578 forks source link

How is it different from NER? #203

Closed ghost closed 5 years ago

ghost commented 5 years ago

This is not a bug but a question. The solution looks promising. But wondering how is this different from an NER system (or LUIS entity recognition) ? I can see this spacy NER being used. Is it an extension to NER system, exposed as a service or there is some other intelligence added to the system.

omri374 commented 5 years ago

This is an entire solution for PII anonymization, not just a model. You can deploy it to your production environment without additional operationalization.

We do use Spacy NER, but only for certain PII entities. NER is not the right tool for some entities, like credit card numbers (that require checksum) and others. Presidio lets you implement your own recognizers using NER, regex, black-list and most, or use the predefined recognizers already implemented.

ghost commented 5 years ago

Thanks