open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
255 stars 165 forks source link

LLM: Adding metrics for model.accuracy to clarify the quality of the model #847

Open gyliu513 opened 6 months ago

gyliu513 commented 6 months ago

Area(s)

area:gen-ai

Is your change request related to a problem? Please describe.

This is for llm semantic convention, suggest adding metrics for model.accuracy to clarify the quality of the model, this is important for trusted ai.

Describe the solution you'd like

Add a new metrics for model accuracy.

Describe alternatives you've considered

No response

Additional context

No response

gyliu513 commented 6 months ago

@lmolkova can you help add this to project backlog? Thanks

cartermp commented 4 months ago

@gyliu513 could you elaborate on the kinds of things you'd like to see here? I can see this being a single floating-point value, but what would generally lead to computing that? Are there other values in that process (e.g., running evals) that should be considered as well?

gyliu513 commented 4 months ago

Screenshot 2024-05-09 at 3 27 25 PM

I learn sth from https://docs.arize.com/phoenix/evaluation/llm-evals, and seems we need to evaluate from different points of views, like hallucination, relevance precision etc. But for those evaluations, we may need some functions to help customer do those evaluations. Thoughts? Thanks