open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
220 stars 141 forks source link

LLM: Standardized fields for LLM Security and protection [Discussion] #1007

Open susan-shu-c opened 1 month ago

susan-shu-c commented 1 month ago

Area(s)

area:gen-ai , llm

Is your change request related to a problem? Please describe.

NOTE: narrowed down the list of fields in https://github.com/open-telemetry/semantic-conventions/issues/1034

To prevent threats to LLM systems, such as misuse, and to log content filters, proposing standardized fields for the purpose of secure and safe LLM usage. Based on frameworks such as OWASP’s LLM Top 10 and MITRE ATLAS.

An example is that a user may be using various LLM vendors or their own deployments, and wish to log all of them in a standardized manner. Our team has published a blog proposing standardized fields for LLM Security, led by @Mikaayenson.

Initially, we wanted to add these fields to ECS (Elastic Common Schema), but since the convergence/donation of ECS to OpenTelemetry we're following the guidelines to propose changes to OTel.

Additional example of our work in LLM Security which leverage fields like the ones proposed: blog on implementing LLM Security via proxy.

Describe the solution you'd like

The below are the fields that we used in our work in standardized fields for LLM Security across vendors/deployments etc.

The same list is also available as a gist

Category Field Type Description Existing OTel SemConv (as of May 6, 2024)
General LLM Interaction Fields gen_ai.prompt text The full text of the user's request to the gen_ai.
gen_ai.usage.prompt_tokens integer Number of tokens in the user's request. gen_ai.usage.prompt_tokens
gen_ai.completion text The full text of the LLM's response.
gen_ai.usage.completion_tokens integer Number of tokens in the LLM's response. gen_ai.usage.completion_tokens
gen_ai.system keyword Name of the LLM foundation model vendor. gen_ai.system
gen_ai.user.id keyword Unique identifier for the user.
gen_ai.request.id keyword Unique identifier for the LLM request.
gen_ai.response.id keyword Unique identifier for the LLM response. gen_ai.response.id
gen_ai.response.model
gen_ai.response.error_code keyword Error code returned in the LLM response.
gen_ai.response.finish_reasons keyword array Reason the LLM response stopped. gen_ai.response.finish_reasons
gen_ai.request.timestamp date Timestamp when the request was made.
gen_ai.response.timestamp date Timestamp when the response was received.
gen_ai.request.model.id keyword ID of the LLM model a request is being made to. _genai.request.model
gen_ai.request.max_tokens integer Maximum number of tokens the LLM generates for a request. gen_ai.request.max_tokens
gen_ai.request.temperature float Temperature setting for the LLM request. gen_ai.request.temperature
gen_ai.request.top_k float The top_k sampling setting for the LLM request.
gen_ai.request.top_p float The top_p sampling setting for the LLM request. gen_ai.request.top_p
gen_ai.request.model_version keyword Version of the LLM model used to generate the response.
gen_ai.request.model.role keyword Role of the LLM model in the interaction.
gen_ai.request.model.type keyword Type of LLM model.
gen_ai.request.model.description keyword Description of the LLM model.
gen_ai.request.model.instructions text Custom instructions for the LLM model.
Text Quality and Relevance Metric Fields gen_ai.text.readability_score float Measures the readability level of the text.
gen_ai.text.complexity_score float Evaluates the complexity of the text.
gen_ai.text.similarity_score float Measures the similarity between the prompt and response.
Security Metric Fields gen_ai.security.regex_pattern_count integer Counts occurrences of strings matching user-defined regex patterns.
gen_ai.security.jailbreak_score float Measures similarity to known jailbreak attempts.
gen_ai.security.prompt_injection_score float Measures similarity to known prompt injection attacks.
gen_ai.security.hallucination_consistency float Consistency check between multiple responses.
gen_ai.security.refusal_score float Measures similarity to known LLM refusal responses.
Policy Enforcement Fields gen_ai.policy.name keyword Name of the specific policy that was triggered.
gen_ai.policy.violation boolean Specifies if a security policy was violated.
gen_ai.policy.action keyword Action taken due to a policy violation, such as blocking, alerting, or modifying the content.
gen_ai.policy.match_detail nested Details about what specifically triggered the policy, including matched words, phrases, or patterns.
gen_ai.policy.confidence float Confidence level in the policy match that triggered the action, quantifying how closely the identified content matched the policy criteria.
Threat Analysis Fields gen_ai.threat.risk_score float Numerical score indicating the potential risk associated with the response.
gen_ai.threat.type keyword Type of threat detected in the LLM interaction.
gen_ai.threat.detected boolean Whether a security threat was detected.
gen_ai.threat.category keyword Category of the detected security threat.
gen_ai.threat.description text Description of the detected security threat.
gen_ai.threat.action keyword Recommended action to mitigate the detected security threat.
gen_ai.threat.source keyword Source of the detected security threat.
gen_ai.threat.signature keyword Signature of the detected security threat.
gen_ai.threat.yara_matches nested Stores results from YARA scans including rule matches and categories.
Compliance Fields gen_ai.compliance.violation_detected boolean Indicates if any compliance violation was detected during the interaction.
gen_ai.compliance.violation_code keyword Code identifying the specific compliance rule that was violated.
gen_ai.compliance.response_triggered keyword array Lists compliance-related filters that were triggered during the processing of the response, such as data privacy filters or regulatory compliance checks.
gen_ai.compliance.request_triggered keyword array Lists compliance-related filters that were triggered during the processing of the request, such as data privacy filters or regulatory compliance checks.
OWASP Top Ten Specific Fields gen_ai.owasp.id keyword Identifier for the OWASP risk addressed.
gen_ai.owasp.description text Description of the OWASP risk triggered.
Security Tools Analysis Fields gen_ai.analysis.tool_names keyword array Name of the security or analysis tools used.
gen_ai.analysis.function keyword Name of the security or analysis function used.
gen_ai.analysis.findings nested Detailed findings from security tools.
gen_ai.analysis.action_recommended keyword Recommended actions based on the analysis.
Sentiment and Toxicity Analysis Fields gen_ai.sentiment.score float Sentiment analysis score.
gen_ai.sentiment.toxicity_score float Toxicity analysis score.
gen_ai.sentiment.content_inappropriate boolean Whether the content was flagged as inappropriate or sensitive.
gen_ai.sentiment.content_categories keyword array Categories of content identified as sensitive or requiring moderation.
Performance Metric Fields gen_ai.performance.response_time long Time taken by the LLM to generate a response in milliseconds.
gen_ai.performance.request_size long Size of the request payload in bytes.
gen_ai.performance.start_response_time long Time taken by the LLM to send first response byte in milliseconds.
gen_ai.performance.response_size long Size of the response payload in bytes.

Describe alternatives you've considered

Alternatives are to submit these fields only to ECS, but since the donation of ECS, the standard is to discuss and propose to OTel.

Additional context

We'd like to open up a discussion; we are happy to discuss the fields, take any thoughts and suggestions!

drewby commented 1 month ago

This list illustrates the significant scope of monitoring Gen AI applications! Here's my feedback:

One challenge is the number of topics that require focused discussions for incremental progress. For instance, general attributes necessary for broad application could be discussed separately. Some attributes may be vendor-specific, which is why they have not been included yet, but are planned for a future PR. Other discussions, such as those surrounding model versions, were explored earlier and could be revisited if broken into their own issue or PR.

Then, consider the rest of the categories in two ways. First, examine if there's a way to generalize some of these attributes to avoid having a distinct set for each security category (it seems maybe you've already done this analysis). Second, break each of these categories into separate issues or PRs as well, particularly the general evaluation attributes, which may already have an existing issue.

Each of these smaller issues or PRs will focus on detailed discussions, prototyping, validation and follow the lifecycle for semantic conventions. The smaller increments will move along faster and less likely to have one category get stuck by a debate in another category.

How would you break it down? How would you prioritize the subtopics so the most important land in the near term?

Thank you for putting the list together and all the source material Susan @susan-shu-c !!

piotrm0 commented 1 month ago

Do all of the attributes that end in "score" have a single consistent and agreed upon definition of how such scores are computed?

susan-shu-c commented 1 month ago

@drewby thank you for the detailed response! I am going through some open PRs and do spot some fields that would be introduced, for example this PR with .duration.

Great suggestion to split up the issues into smaller categories; we've been discussing priorities hence the slower response, and will update this issue accordingly + create new smaller ones.


@piotrm0 Good question, if the users are creating detection rules on top of the other fields, then they'd be responsible to determine the risk score and populate the .scores fields. Otherwise if they use prebuilt rules from vendors, then the scores will have been prepopulated by the vendor according to the vendor's recommendations. Here's an example of a detection rule @mikaayenson built with a score: link

We should defer to the vendors; Azure / GCP / AWS will have their own definitions: an example is the AWS Bedrock Guardrails which define None | Low | Medium | High behind the scenes. So the user can use those vendor definitions to map it to some numerical definition for their own case.

susan-shu-c commented 1 month ago

Hi all, I've created a much narrowed-down list of fields based only on detection rules that we've created. Many of the fields used in said detection rules already exist in the SemConv, so I only include the ones not already existing or in this PR: https://github.com/open-telemetry/semantic-conventions/pull/955

Please let us know what you think!