guardrails-ai / sensitive_topics

Apache License 2.0
0 stars 1 forks source link

Overview

| Developed by | Guardrails AI | | Date of development | Feb 15, 2024 | | Validator type | Format | | Blog | | | License | Apache 2 | | Input/Output | Output |

Description

Intended Use

This validator checks if the input value contains sensitive topics. The default behavior first runs a Zero-Shot model, and then falls back to ask OpenAI's gpt-3.5-turbo if the Zero-Shot model is not confident in the topic classification (score < 0.5). In our experiments this LLM fallback increases accuracy by 15% but also increases latency (more than doubles the latency in the worst case). Both the Zero-Shot classification and the GPT classification may be toggled.

Requirements

Installation

$ guardrails hub install hub://guardrails/sensitive_topics

Usage Examples

Validating string output via Python

In this example, we apply the validator to a string output generated by an LLM.

# Import Guard and Validator
from guardrails import Guard
from guardrails.hub import SensitiveTopic

# Setup Guard
guard = Guard().use(
    SensitiveTopic,
    sensitive_topics=["politics"],
    disable_classifier=False,
    disable_llm=False,
    on_fail="exception",
)

# Test passing response
guard.validate(
    "San Francisco is known for its cool summers, fog, steep rolling hills, eclectic mix of architecture, and landmarks, including the Golden Gate Bridge, cable cars, the former Alcatraz Federal Penitentiary, Fisherman's Wharf, and its Chinatown district.",
)

try:
    # Test failing response
    guard.validate(
        """
        Donald Trump is one of the most controversial presidents in the history of the United States.
        He has been impeached twice, and is running for re-election in 2024.
        """
    )
except Exception as e:
    print(e)

Output:

Validation failed for field with errors: Sensitive topics detected: politics

API Reference

__init__(self, sensitive_topics=["holiday or anniversary of the trauma or loss", "certain sounds, sights, smells, or tastes related to the trauma", "loud voices or yelling", "loud noises", "arguments", "being ridiculed or judged", "being alone", "getting rejected", "being ignored", "breakup of a relationship", "violence in the news", "sexual harassment or unwanted touching", "physical illness or injury",], device=-1, model="facebook/bart-large-mnli", llm_callable="gpt-3.5-turbo", disable_classifier=False, disable_llm=False, model_threshold=0.5, on_fail="noop")


__call__(self, value, metadata={}) -> ValidationResult