STORY - Automate scraping + add indexing + scheduling + evaluation - Githubissues

i-dot-ai / caddy-chatbot

Caddy is an AI powered co-pilot for customer service functions everywhere.

https://ai.gov.uk/projects/caddy/

MIT License

15 stars 4 forks source link

STORY - Automate scraping + add indexing + scheduling + evaluation #137

Open AndreasThinks opened 4 months ago

AndreasThinks commented 4 months ago

We need to review how we do this broadly.

mhgov commented 4 months ago

User stories:

be able to version control and monitor LLM answers
be able to see quality of answers, refine and improve prompts over time
as expert user play with different prompts and see if I can improve
as a developer run a unit test/regression test to see if LLM can be deployed
Witness scores through time to check regression/performance

Short term maintenance approach:

Sort caddy messages v.s. caddy responses dynamo db table to include the various additional desired tags (i.e. routing, eval scores, recieved timestamps, etc)
Add Evaluation metrics (as in KM portal) into the caddy responses table
run evaluation metrics on current set of 20 caddy questions and generated answers
bring eval into Ci/CD as basic unit test

Long term maintenance approach:

Take all topics/sample of queries
Use caddy to generate answer for each question
Crowdsource to allow advisors/supervisors across LCAs to refine and create 'model' answers
Through time, measure incoming queries against model queries and look at drift
separate platform for caddy?

Separate project on exprt/crowdsource management of LLM answers in Public sector