Open paulo-raca opened 1 year ago
Hi @paulo-raca, the issue is with how the AnalyzerEngine
is defined when the flask app is set up. Currently it is set up with default parameters (i.e. only English as supported language).
If you update the app.py file to create the AnalyzerEngine
differently, you would be able to get other recognizers as well.
Instead of:
https://github.com/microsoft/presidio/blob/60911edf166d216e14cbed6ba6a0ac2d42796fb4/presidio-analyzer/app.py#L40
You could pass:
self.engine = AnalyzerEngine(supported_languages=["en", "es"])
Thank you for the feedback. We will look for ways to make this easier, and would be happy to consider community contributions as well.
Hello, @omri374, thanks for pointing this up.
I think this should be configurable via CLI arguments (That are also acessible via the docker run
commands)
If you agree, I can create a PR for this
A contribution would be awesome! Thanks @paulo-raca
There are other parameters we can take into account (for this PR or a future one) such as the NLP engine configuration
Describe the bug
I was looking at the
/supportedentities
REST API and tried adding?language=es
and?language=it
to get the Spain / Italy-specific entities I saw in the docs.Turns out that it doesn't really work. Anything but
en
returns{"error":"No matching recognizers were found to serve the request."}
(HTTP 500)To Reproduce
Expected behavior
["IT_FISCAL_CODE","IT_DRIVER_LICENSE","IT_VAT_CODE","IT_PASSPORT","IT_IDENTITY_CARD"]
["ES_NIF"]
[]
I'm not entirely sure if the global entities (email, phone number, URL, etc) should be returned too, since they have
supported_language='en'
in the code. But this is probably another issue :sweat_smile: