guardrails-ai / detect_pii

Guardrails AI: PII Filter - Validates that any text does not contain any PII
Apache License 2.0
5 stars 1 forks source link

Overview

| Developed by | Guardrails AI | | Date of development | Feb 15, 2024 | | Validator type | Privacy, Security | | Blog | | | License | Apache 2 | | Input/Output | Input, Output |

Description

Intended Use

This validator ensures that any given text does not contain PII. This validator uses Microsoft's Presidio (https://github.com/microsoft/presidio) to detect PII in the text. If PII is detected, the validator will fail with a programmatic fix that anonymizes the text. Otherwise, the validator will pass.

Requirements

Installation

$ guardrails hub install hub://guardrails/detect_pii

Usage Examples

Validating string output via Python

# Import Guard and Validator
from guardrails.hub import DetectPII
from guardrails import Guard

# Setup Guard
guard = Guard().use(
    DetectPII, ["EMAIL_ADDRESS", "PHONE_NUMBER"], "exception"
)

guard.validate("Good morning!")  # Validator passes
try:
    guard.validate(
        "If interested, apply at not_a_real_email@guardrailsai.com"
    )  # Validator fails
except Exception as e:
    print(e)

Output:

Validation failed for field with errors: The following text in your response contains PII:
If interested, apply at not_a_real_email@guardrailsai.com

Validating JSON output via Python

In this example, we apply the validator to a string field of a JSON output generated by an LLM.

# Import Guard and Validator
from pydantic import BaseModel, Field
from guardrails.hub import DetectPII
from guardrails import Guard

# Initialize Validator
val = DetectPII(pii_entities=["EMAIL_ADDRESS", "PHONE_NUMBER"], on_fail="exception")

# Create Pydantic BaseModel
class UserHistory(BaseModel):
    name: str
    last_msg: str = Field(description="Last message sent by user", validators=[val])

# Create a Guard to check for valid Pydantic output
guard = Guard.from_pydantic(output_class=UserHistory)

# Run LLM output generating JSON through guard
try:
    guard.parse(
        """
    {
        "name": "John Smith",
        "last_msg": "My account isn't working. My username is not_a_real_email@guardrailsai.com"
    }
    """
    )
except Exception as e:
    print(e)

Output:

Validation failed for field with errors: The following text in your response contains PII:
My account isn't working. My username is not_a_real_email@guardrailsai.com

API Reference

__init__(self, pii_entities, on_fail="noop")


validate(self, value, metadata={}) -> ValidationResult