guardrails-ai / detect_pii

Guardrails AI: PII Filter - Validates that any text does not contain any PII
Apache License 2.0
4 stars 1 forks source link

Performance Comparison - Presidio (direct) vs Detect PII (Guardrails) #12

Open RohitPShetty opened 2 weeks ago

RohitPShetty commented 2 weeks ago

Hi,

I was comparing the performance of using Presidio directly vs using the Detect PII validator via Guardrails. In most cases, I found that there is difference of 1/10th of a second with using Presidio directly performing better than Detect PII. Both used the default model (en_web_core_lg) and on the same dataset. Wanted to understand if this is due to the additional Guardrails wrappers or am I missing something.

Example dataset: https://github.com/microsoft/presidio-research/blob/master/data/synth_dataset_v2.json

PII entities:

    "pii": [
        "EMAIL_ADDRESS",
        "PHONE_NUMBER",
        "IP_ADDRESS",
        "DATE_TIME",
        "LOCATION",
        "PERSON",
        "URL",
        "NRP",
        "CREDIT_CARD",
        "US_BANK_NUMBER",
        "US_DRIVER_LICENSE"
    ]
RohitPShetty commented 2 weeks ago

@nichwch @zsimjee Would love to get your thoughts on this.