Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
This PR implements a comprehensive multi-layered jailbreak detection system for LLM security. The system combines pattern-based analysis with LLM-based deep content evaluation to identify and prevent potential security bypass attempts.
Key components include:
Pattern-Based Analysis System
LLM-Based Deep Content Evaluation
Integrated Analysis Pipeline
Security Metrics & Logging
Type of Change
[x] New feature (non-breaking change which adds functionality)
[x] This change requires a documentation update
How Has This Been Tested?
Execute the agentneo/utils/security_utils.py file in Python after uncommenting the main function. This will invoke the IntegratedJailbreakDetector to analyze the user's prompt and determine if any jailbreak attempt has been made.
Checklist:
[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] New and existing unit tests pass locally with my changes
[x] Any dependent changes have been merged and published in downstream modules
Additional Context
The implementation uses an advanced detector class IntegratedJailbreakDetector that combines multiple analysis approaches:
Pattern-based detection using regex and keyword analysis
LLM-based deep content evaluation using llama-3.1-70b-versatile model
Risk scoring and confidence metrics for accurate threat assessment
Comprehensive logging and monitoring capabilities
Impact on Roadmap
This PR is a critical component of security enhancement initiative. It establishes the foundation for advanced threat detection capabilities.
Description
This PR implements a comprehensive multi-layered jailbreak detection system for LLM security. The system combines pattern-based analysis with LLM-based deep content evaluation to identify and prevent potential security bypass attempts.
Key components include:
Type of Change
How Has This Been Tested?
Execute the agentneo/utils/security_utils.py file in Python after uncommenting the main function. This will invoke the IntegratedJailbreakDetector to analyze the user's prompt and determine if any jailbreak attempt has been made.
Checklist:
Additional Context
The implementation uses an advanced detector class IntegratedJailbreakDetector that combines multiple analysis approaches:
Impact on Roadmap
This PR is a critical component of security enhancement initiative. It establishes the foundation for advanced threat detection capabilities.