Open mhgov opened 4 months ago
Want Ci/Cd approach to evaluation. I.e. every push gets checked on minimum set of eval measures.
Want to save results.
Need minimum thresholds for a model/prompt to be made 'public'
Want Ci/Cd approach to evaluation. I.e. every push gets checked on minimum set of eval measures.
Want to save results.
Need minimum thresholds for a model/prompt to be made 'public'