i-dot-ai / caddy-chatbot

Caddy is an AI powered co-pilot for customer service functions everywhere.
https://ai.gov.uk/projects/caddy/
MIT License
8 stars 4 forks source link

CreateCI/CD eval pipeline with minimum thresholds #145

Open mhgov opened 1 month ago

mhgov commented 1 month ago

Want Ci/Cd approach to evaluation. I.e. every push gets checked on minimum set of eval measures.

Want to save results.

Need minimum thresholds for a model/prompt to be made 'public'