Open gyliu513 opened 6 months ago
@lmolkova can you help add this to project backlog? Thanks
@gyliu513 could you elaborate on the kinds of things you'd like to see here? I can see this being a single floating-point value, but what would generally lead to computing that? Are there other values in that process (e.g., running evals) that should be considered as well?
I learn sth from https://docs.arize.com/phoenix/evaluation/llm-evals, and seems we need to evaluate from different points of views, like hallucination, relevance precision etc. But for those evaluations, we may need some functions to help customer do those evaluations. Thoughts? Thanks
Area(s)
area:gen-ai
Is your change request related to a problem? Please describe.
This is for llm semantic convention, suggest adding metrics for model.accuracy to clarify the quality of the model, this is important for trusted ai.
Describe the solution you'd like
Add a new metrics for model accuracy.
Describe alternatives you've considered
No response
Additional context
No response