raga-ai-hub / AgentNeo

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
https://agentneo.raga.ai/
GNU General Public License v3.0
1.09k stars 76 forks source link

Team Andrai -- 1.1 LLM as a Judge -- Solution Submission #121

Open sathyapriyaa-sketch opened 2 days ago

sathyapriyaa-sketch commented 2 days ago

Pull Request Template

Description

The unique feature introduced is the incorporation of Feedback Score, Hallucination Score, Constructive Feedback, and Areas of Improvement Metrics into an LLM-as-Judge framework, enabling comprehensive evaluation of LLM-generated responses. This feature ensures a multi-faceted assessment, offering quantitative metrics alongside actionable qualitative insights, thereby fostering continuous refinement and reliability of the language model's outputs.

Related Issue

1.1 LLM as a Judge -- Hackathon Solution Submission

Modules Involved

✅ Package Installation & Module Import: Laying the foundation for seamless functionality. ✅ Data Preprocessing: Breaking down inputs into User Query, LLM Response, and Human Response. ✅ LLM Initialization: Configuring the model to ensure robust performance. ✅ Response Generation: Leveraging Prompt Engineering and Chain-of-Thought reasoning for dynamic outputs. ✅ Evaluation Metrics:

Total LLM Rating (out of 5) Hallucination Score to track factual accuracy Detailed feedback on LLM performance Identifying areas for improvement

How Has This Been Tested?

This was tested by running requirements.txt as "pip install -r requirements.txt", unit_test.py as "python unit_test.py" and output being saved as unit_test_output.png

Checklist:

✅ My code follows the style guidelines of this project ✅ I have performed a self-review of my own code ✅ I have commented my code, particularly in hard-to-understand areas ✅ I have made corresponding changes to the documentation ✅ My changes generate no new warnings ✅ I have added tests that prove my fix is effective or that my feature works ✅ New and existing unit tests pass locally with my changes ✅ Any dependent changes have been merged and published in downstream modules

Additional Context

[Add any other context or screenshots about the pull request here.]

Impact on Roadmap

[If applicable, describe how this PR impacts or aligns with the project roadmap]