microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images
https://microsoft.github.io/presidio
MIT License
3.83k stars 574 forks source link

96c word is being incorrectly identified as PERSON #1352

Closed bhanu-pappala closed 7 months ago

bhanu-pappala commented 7 months ago

Describe the bug 96c word is being incorrectly identified as PERSON if the query is like "what is letter 96c". It is fine if I remove letter word.

Expected behavior what is letter 96c

Screenshots image Additional context Model being used is spacy encore web lg Package used: presidio-analyzer Tried versions 2.2.351, 2.2.354.

omri374 commented 7 months ago

Each NER model could have false positives. Consider looking into other models, such as those coming from huggingface or flair. Our demo website allows you to easily experiment with a few selected models, and the documentation has details on how to integrate models other than spaCy en_core_web_lg