Adding new recognizers to the predefined list via YAML file in Docker image and using it via REST

jamie87u7 commented 1 year ago

Hello,

I'm using the latest docker image and the rest service to detect PII. I'm trying to add a couple of new recognizers to the predefined list (Titles recognizer and Zip code Recognizer from the documentation) via YAML file per the documentation. I can mount the YAML file to the docker image and run a python script which adds the recognizers to the predefined list per the documentation

from presidio_analyzer.recognizer_registry import RecognizerRegistry

yaml_file = "/usr/bin/presidio-analyzer/custom/recognizers.yaml"  # path to YAML file
registry = RecognizerRegistry()
registry.load_predefined_recognizers()
registry.add_recognizers_from_yaml(yaml_file)
recognizers_list = registry.get_recognizers(language='en',all_fields='true')
names = [o.name for o in recognizers_list]
print(names)

'Titles recognizer' is now listed. However, I cannot see it when calling the rest 'http://localhost:5002/recognizers?language=en' Also, subsequent attempts to list the predefined recognizers does not list the new recognizer. How can the new recognizers from the YAML file be loaded globally?

Any help/pointers would be much appreciated. Thanks

SharonHart commented 1 year ago

@jamie87u7 Adding recognizer from yaml works in the as-a-package use. The documentation link you've referenced ends with a call to AnalyzerEngine().analyze method (which is package use), and can't be used with the REST API without modifying the container. Try to use the Ad-hoc recognizers REST API: https://microsoft.github.io/presidio/tutorial/09_ad_hoc/

jamie87u7 commented 1 year ago

@SharonHart Thanks for the reply. Although Ad-hoc recognizer should work for some use cases, it would be cumbersome if there are quite a number of recognizers and if they have to be added to every single rest call. I assume the overhead to parse them for every rest call cannot be ignored?

I also failed to include this part of the documentation which describes how to add to the list of predefined recognizers by modifying the source.

I understand this is a very specific usecase and might have to script the above steps to modify the source and run it as part of docker compose in the mean time. It would be nice to have a no code way to add to the predefined recognizers for those of us who are not well versed with python. Ofcourse, I will create a PR to add to the list of recognizers in future.

omri374 commented 1 year ago

@jamie87u7 have you updated the app.py file with the code you've added? This line: https://github.com/microsoft/presidio/blob/1d07b03866065fae01b38eabb8d2fd74e7bc4a57/presidio-analyzer/app.py#L40 should be replaced with the code that adds the yaml recognizer into the registry and the analyzer engine. Also, make sure the yaml file is added to the docker container.

registry = RecognizerRegistry()
registry.load_predefined_recognizers()
registry.add_recognizers_from_yaml(yaml_file)
self.engine = AnalyzerEngine(registry=registry)

omri374 commented 1 month ago

Closing as yaml config is now supported

microsoft / presidio

Adding new recognizers to the predefined list via YAML file in Docker image and using it via REST #1004