LLM: Standardized fields for LLM Security and protection [Discussion]

susan-shu-c commented 6 months ago

Area(s)

area:gen-ai , llm

Is your change request related to a problem? Please describe.

NOTE: narrowed down the list of fields in https://github.com/open-telemetry/semantic-conventions/issues/1034

To prevent threats to LLM systems, such as misuse, and to log content filters, proposing standardized fields for the purpose of secure and safe LLM usage. Based on frameworks such as OWASP’s LLM Top 10 and MITRE ATLAS.

An example is that a user may be using various LLM vendors or their own deployments, and wish to log all of them in a standardized manner. Our team has published a blog proposing standardized fields for LLM Security, led by @Mikaayenson.

Initially, we wanted to add these fields to ECS (Elastic Common Schema), but since the convergence/donation of ECS to OpenTelemetry we're following the guidelines to propose changes to OTel.

Additional example of our work in LLM Security which leverage fields like the ones proposed: blog on implementing LLM Security via proxy.

Describe the solution you'd like

The below are the fields that we used in our work in standardized fields for LLM Security across vendors/deployments etc.

The same list is also available as a gist

Category	Field	Type	Description	Existing OTel SemConv (as of May 6, 2024)
General LLM Interaction Fields	gen_ai.prompt	text	The full text of the user's request to the gen_ai.
	gen_ai.usage.prompt_tokens	integer	Number of tokens in the user's request.	gen_ai.usage.prompt_tokens
	gen_ai.completion	text	The full text of the LLM's response.
	gen_ai.usage.completion_tokens	integer	Number of tokens in the LLM's response.	gen_ai.usage.completion_tokens
	gen_ai.system	keyword	Name of the LLM foundation model vendor.	gen_ai.system
	gen_ai.user.id	keyword	Unique identifier for the user.
	gen_ai.request.id	keyword	Unique identifier for the LLM request.
	gen_ai.response.id	keyword	Unique identifier for the LLM response.	gen_ai.response.id
				gen_ai.response.model
	gen_ai.response.error_code	keyword	Error code returned in the LLM response.
	gen_ai.response.finish_reasons	keyword array	Reason the LLM response stopped.	gen_ai.response.finish_reasons
	gen_ai.request.timestamp	date	Timestamp when the request was made.
	gen_ai.response.timestamp	date	Timestamp when the response was received.
	gen_ai.request.model.id	keyword	ID of the LLM model a request is being made to.	_genai.request.model
	gen_ai.request.max_tokens	integer	Maximum number of tokens the LLM generates for a request.	gen_ai.request.max_tokens
	gen_ai.request.temperature	float	Temperature setting for the LLM request.	gen_ai.request.temperature
	gen_ai.request.top_k	float	The top_k sampling setting for the LLM request.
	gen_ai.request.top_p	float	The top_p sampling setting for the LLM request.	gen_ai.request.top_p
	gen_ai.request.model_version	keyword	Version of the LLM model used to generate the response.
	gen_ai.request.model.role	keyword	Role of the LLM model in the interaction.
	gen_ai.request.model.type	keyword	Type of LLM model.
	gen_ai.request.model.description	keyword	Description of the LLM model.
	gen_ai.request.model.instructions	text	Custom instructions for the LLM model.
Text Quality and Relevance Metric Fields	gen_ai.text.readability_score	float	Measures the readability level of the text.
	gen_ai.text.complexity_score	float	Evaluates the complexity of the text.
	gen_ai.text.similarity_score	float	Measures the similarity between the prompt and response.
Security Metric Fields	gen_ai.security.regex_pattern_count	integer	Counts occurrences of strings matching user-defined regex patterns.
	gen_ai.security.jailbreak_score	float	Measures similarity to known jailbreak attempts.
	gen_ai.security.prompt_injection_score	float	Measures similarity to known prompt injection attacks.
	gen_ai.security.hallucination_consistency	float	Consistency check between multiple responses.
	gen_ai.security.refusal_score	float	Measures similarity to known LLM refusal responses.
Policy Enforcement Fields	gen_ai.policy.name	keyword	Name of the specific policy that was triggered.
	gen_ai.policy.violation	boolean	Specifies if a security policy was violated.
	gen_ai.policy.action	keyword	Action taken due to a policy violation, such as blocking, alerting, or modifying the content.
	gen_ai.policy.match_detail	nested	Details about what specifically triggered the policy, including matched words, phrases, or patterns.
	gen_ai.policy.confidence	float	Confidence level in the policy match that triggered the action, quantifying how closely the identified content matched the policy criteria.
Threat Analysis Fields	gen_ai.threat.risk_score	float	Numerical score indicating the potential risk associated with the response.
	gen_ai.threat.type	keyword	Type of threat detected in the LLM interaction.
	gen_ai.threat.detected	boolean	Whether a security threat was detected.
	gen_ai.threat.category	keyword	Category of the detected security threat.
	gen_ai.threat.description	text	Description of the detected security threat.
	gen_ai.threat.action	keyword	Recommended action to mitigate the detected security threat.
	gen_ai.threat.source	keyword	Source of the detected security threat.
	gen_ai.threat.signature	keyword	Signature of the detected security threat.
	gen_ai.threat.yara_matches	nested	Stores results from YARA scans including rule matches and categories.
Compliance Fields	gen_ai.compliance.violation_detected	boolean	Indicates if any compliance violation was detected during the interaction.
	gen_ai.compliance.violation_code	keyword	Code identifying the specific compliance rule that was violated.
	gen_ai.compliance.response_triggered	keyword array	Lists compliance-related filters that were triggered during the processing of the response, such as data privacy filters or regulatory compliance checks.
	gen_ai.compliance.request_triggered	keyword array	Lists compliance-related filters that were triggered during the processing of the request, such as data privacy filters or regulatory compliance checks.
OWASP Top Ten Specific Fields	gen_ai.owasp.id	keyword	Identifier for the OWASP risk addressed.
	gen_ai.owasp.description	text	Description of the OWASP risk triggered.
Security Tools Analysis Fields	gen_ai.analysis.tool_names	keyword array	Name of the security or analysis tools used.
	gen_ai.analysis.function	keyword	Name of the security or analysis function used.
	gen_ai.analysis.findings	nested	Detailed findings from security tools.
	gen_ai.analysis.action_recommended	keyword	Recommended actions based on the analysis.
Sentiment and Toxicity Analysis Fields	gen_ai.sentiment.score	float	Sentiment analysis score.
	gen_ai.sentiment.toxicity_score	float	Toxicity analysis score.
	gen_ai.sentiment.content_inappropriate	boolean	Whether the content was flagged as inappropriate or sensitive.
	gen_ai.sentiment.content_categories	keyword array	Categories of content identified as sensitive or requiring moderation.
Performance Metric Fields	gen_ai.performance.response_time	long	Time taken by the LLM to generate a response in milliseconds.
	gen_ai.performance.request_size	long	Size of the request payload in bytes.
	gen_ai.performance.start_response_time	long	Time taken by the LLM to send first response byte in milliseconds.
	gen_ai.performance.response_size	long	Size of the response payload in bytes.

Describe alternatives you've considered

Alternatives are to submit these fields only to ECS, but since the donation of ECS, the standard is to discuss and propose to OTel.

Additional context

We'd like to open up a discussion; we are happy to discuss the fields, take any thoughts and suggestions!

drewby commented 6 months ago

This list illustrates the significant scope of monitoring Gen AI applications! Here's my feedback:

One challenge is the number of topics that require focused discussions for incremental progress. For instance, general attributes necessary for broad application could be discussed separately. Some attributes may be vendor-specific, which is why they have not been included yet, but are planned for a future PR. Other discussions, such as those surrounding model versions, were explored earlier and could be revisited if broken into their own issue or PR.

Then, consider the rest of the categories in two ways. First, examine if there's a way to generalize some of these attributes to avoid having a distinct set for each security category (it seems maybe you've already done this analysis). Second, break each of these categories into separate issues or PRs as well, particularly the general evaluation attributes, which may already have an existing issue.

Each of these smaller issues or PRs will focus on detailed discussions, prototyping, validation and follow the lifecycle for semantic conventions. The smaller increments will move along faster and less likely to have one category get stuck by a debate in another category.

How would you break it down? How would you prioritize the subtopics so the most important land in the near term?

Thank you for putting the list together and all the source material Susan @susan-shu-c !!

piotrm0 commented 6 months ago

Do all of the attributes that end in "score" have a single consistent and agreed upon definition of how such scores are computed?

susan-shu-c commented 6 months ago

@drewby thank you for the detailed response! I am going through some open PRs and do spot some fields that would be introduced, for example this PR with .duration.

Great suggestion to split up the issues into smaller categories; we've been discussing priorities hence the slower response, and will update this issue accordingly + create new smaller ones.

@piotrm0 Good question, if the users are creating detection rules on top of the other fields, then they'd be responsible to determine the risk score and populate the .scores fields. Otherwise if they use prebuilt rules from vendors, then the scores will have been prepopulated by the vendor according to the vendor's recommendations. Here's an example of a detection rule @mikaayenson built with a score: link

We should defer to the vendors; Azure / GCP / AWS will have their own definitions: an example is the AWS Bedrock Guardrails which define None | Low | Medium | High behind the scenes. So the user can use those vendor definitions to map it to some numerical definition for their own case.

susan-shu-c commented 6 months ago

Hi all, I've created a much narrowed-down list of fields based only on detection rules that we've created. Many of the fields used in said detection rules already exist in the SemConv, so I only include the ones not already existing or in this PR: https://github.com/open-telemetry/semantic-conventions/pull/955

Please let us know what you think!

https://github.com/open-telemetry/semantic-conventions/issues/1034

susan-shu-c commented 3 months ago

Closing: Superseded by https://github.com/open-telemetry/semantic-conventions/issues/1034

open-telemetry / semantic-conventions