Closed sgsmittal226 closed 4 years ago
Hi sgsmittal226, thanks for your input.
Many entities have very simple patterns (like 7 digits) and it's difficult to differentiate real positive cases from false positives. This is why many recognizers would return results with a very low score (0.01). Additional context words (like "driver" and "license") would increase the score. My suggestion is to put a threshold on the output of the analyzer, in order to avoid false positives.
You can do this by adding a resultsScoreThreshold
field to the analyzer template. See swagger information here: https://github.com/microsoft/presidio/blob/431ac2cea27881878dbc16bdc112b80e827c75d2/presidio-api/cmd/presidio-api/docs/swagger.yaml#L64
Closing for now. Feel free to reopen if you would like to ask additional questions or wish to add additional information.
US Driving license number has different format for each state. but current recognizer match any random string as driving license as well