Closed susan-shu-c closed 3 months ago
This list illustrates the significant scope of monitoring Gen AI applications! Here's my feedback:
One challenge is the number of topics that require focused discussions for incremental progress. For instance, general attributes necessary for broad application could be discussed separately. Some attributes may be vendor-specific, which is why they have not been included yet, but are planned for a future PR. Other discussions, such as those surrounding model versions, were explored earlier and could be revisited if broken into their own issue or PR.
Then, consider the rest of the categories in two ways. First, examine if there's a way to generalize some of these attributes to avoid having a distinct set for each security category (it seems maybe you've already done this analysis). Second, break each of these categories into separate issues or PRs as well, particularly the general evaluation attributes, which may already have an existing issue.
Each of these smaller issues or PRs will focus on detailed discussions, prototyping, validation and follow the lifecycle for semantic conventions. The smaller increments will move along faster and less likely to have one category get stuck by a debate in another category.
How would you break it down? How would you prioritize the subtopics so the most important land in the near term?
Thank you for putting the list together and all the source material Susan @susan-shu-c !!
Do all of the attributes that end in "score" have a single consistent and agreed upon definition of how such scores are computed?
@drewby thank you for the detailed response! I am going through some open PRs and do spot some fields that would be introduced, for example this PR with .duration
.
Great suggestion to split up the issues into smaller categories; we've been discussing priorities hence the slower response, and will update this issue accordingly + create new smaller ones.
@piotrm0 Good question, if the users are creating detection rules on top of the other fields, then they'd be responsible to determine the risk score and populate the .scores
fields. Otherwise if they use prebuilt rules from vendors, then the scores will have been prepopulated by the vendor according to the vendor's recommendations. Here's an example of a detection rule @mikaayenson built with a score: link
We should defer to the vendors; Azure / GCP / AWS will have their own definitions: an example is the AWS Bedrock Guardrails which define None | Low | Medium | High
behind the scenes. So the user can use those vendor definitions to map it to some numerical definition for their own case.
Hi all, I've created a much narrowed-down list of fields based only on detection rules that we've created. Many of the fields used in said detection rules already exist in the SemConv, so I only include the ones not already existing or in this PR: https://github.com/open-telemetry/semantic-conventions/pull/955
Please let us know what you think!
Closing: Superseded by https://github.com/open-telemetry/semantic-conventions/issues/1034
Area(s)
area:gen-ai , llm
Is your change request related to a problem? Please describe.
To prevent threats to LLM systems, such as misuse, and to log content filters, proposing standardized fields for the purpose of secure and safe LLM usage. Based on frameworks such as OWASP’s LLM Top 10 and MITRE ATLAS.
An example is that a user may be using various LLM vendors or their own deployments, and wish to log all of them in a standardized manner. Our team has published a blog proposing standardized fields for LLM Security, led by @Mikaayenson.
Initially, we wanted to add these fields to ECS (Elastic Common Schema), but since the convergence/donation of ECS to OpenTelemetry we're following the guidelines to propose changes to OTel.
Additional example of our work in LLM Security which leverage fields like the ones proposed: blog on implementing LLM Security via proxy.
Describe the solution you'd like
The below are the fields that we used in our work in standardized fields for LLM Security across vendors/deployments etc.
The same list is also available as a gist
Describe alternatives you've considered
Alternatives are to submit these fields only to ECS, but since the donation of ECS, the standard is to discuss and propose to OTel.
Additional context
We'd like to open up a discussion; we are happy to discuss the fields, take any thoughts and suggestions!