Jailbreak Detection for User Prompt

Description

This PR implements a comprehensive multi-layered jailbreak detection system for LLM security. The system combines pattern-based analysis with LLM-based deep content evaluation to identify and prevent potential security bypass attempts.

Key components include:

Pattern-Based Analysis System
LLM-Based Deep Content Evaluation
Integrated Analysis Pipeline
Security Metrics & Logging

Type of Change

[x] New feature (non-breaking change which adds functionality)
[x] This change requires a documentation update

How Has This Been Tested?

Execute the agentneo/utils/security_utils.py file in Python after uncommenting the main function. This will invoke the IntegratedJailbreakDetector to analyze the user's prompt and determine if any jailbreak attempt has been made.

Checklist:

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] New and existing unit tests pass locally with my changes
[x] Any dependent changes have been merged and published in downstream modules

Additional Context

The implementation uses an advanced detector class IntegratedJailbreakDetector that combines multiple analysis approaches:

Pattern-based detection using regex and keyword analysis
LLM-based deep content evaluation using llama-3.1-70b-versatile model
Risk scoring and confidence metrics for accurate threat assessment
Comprehensive logging and monitoring capabilities

Impact on Roadmap

This PR is a critical component of security enhancement initiative. It establishes the foundation for advanced threat detection capabilities.

raga-ai-hub / AgentNeo