raga-ai-hub / AgentNeo

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
https://agentneo.raga.ai/
GNU General Public License v3.0
1.09k stars 76 forks source link

Jailbreak Detection for User Prompt #122

Open joelrobin18 opened 2 days ago

joelrobin18 commented 2 days ago

Description

This PR implements a comprehensive multi-layered jailbreak detection system for LLM security. The system combines pattern-based analysis with LLM-based deep content evaluation to identify and prevent potential security bypass attempts.

Key components include:

Type of Change

How Has This Been Tested?

Execute the agentneo/utils/security_utils.py file in Python after uncommenting the main function. This will invoke the IntegratedJailbreakDetector to analyze the user's prompt and determine if any jailbreak attempt has been made.

Checklist:

Additional Context

The implementation uses an advanced detector class IntegratedJailbreakDetector that combines multiple analysis approaches:

Impact on Roadmap

This PR is a critical component of security enhancement initiative. It establishes the foundation for advanced threat detection capabilities.