guardrails-ai / detect_pii

Guardrails AI: PII Filter - Validates that any text does not contain any PII
Apache License 2.0
6 stars 1 forks source link

Provide options to Configure NLP Engine #11

Open RohitPShetty opened 3 months ago

RohitPShetty commented 3 months ago

In the current code, as per my understanding - the default Spacy model (en_core_web_lg) is utilised when the AnalyzerEngine() is instantiated.

It would be helpful if we could pass in parameters which would indicate the model to be used in the Analyser. Accordingly, based on the compute and accuracy requirements, folks could toggle the model used. This would also help in loading models for different languages.

Ref; https://github.com/microsoft/presidio/blob/main/docs/samples//python/customizing_presidio_analyzer.ipynb

zsimjee commented 2 months ago

Hi Rohit,

Thanks for bringing this to our attention. we have not benchmarked this particular validator for latency yet. we will pick up benchmarking as a task soon and let you know how it goes.