traceloop / openllmetry

Open-source observability for your LLM application, based on OpenTelemetry
https://www.traceloop.com/openllmetry
Apache License 2.0
1.72k stars 138 forks source link

🚀 Feature: log content filter results in proper attributes (Azure OpenAI) #855

Closed nirga closed 1 month ago

nirga commented 4 months ago

Which component is this feature for?

OpenAI Instrumentation

🔖 Feature description

Azure OpenAI has an option to filter inappropriate content, which returns a slightly different response. While #854 handles this partially, we should explicitly log all the reasons in separate attributes.

🎤 Why is this feature needed ?

-

✌️ How do you aim to achieve this?

Create one or more attributes to log the reasons content was redacted.

🔄️ Additional Information

No response

👀 Have you spent some time to check if this feature request has been raised before?

Are you willing to submit PR?

None

dtee1 commented 4 months ago

Interested

nirga commented 4 months ago

@dtee1 yes please :) let me know if you have any questions - you're welcome to join our slack as well.

sukidesuka commented 4 months ago

This change definitely has bugs. I updated to the latest version of traceloop-sdk today, 0.16.6, but even though gpt4's responses are normally output to the console without any filtering issues, it still shows FILTERED on traceloop.

image

image

sukidesuka commented 4 months ago

According to the Azure official documentation https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=warning%2Cpython-new

data: {"id":"","object":"","created":0,"model":"","prompt_annotations":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null}

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":null}

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":"Color"}}],"usage":null}

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" is"}}],"usage":null}

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" a"}}],"usage":null}

...

data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":44,"start_offset":44,"end_offset":198}}],"usage":null}

...

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":"stop","delta":{}}],"usage":null}

data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":506,"start_offset":44,"end_offset":571}}],"usage":null}

data: [DONE]

I believe you need to check whether the finish_reason is 'stop' or 'content_filter', rather than checking if the content_filter_results is null.

nirga commented 4 months ago

@sukidesuka thanks for flagging! I was lazy and didn't test it properly for Azure. I've now added proper testing specifically for Azure and fixed the issue so this won't happen. See https://github.com/traceloop/openllmetry/pull/886

lauradowell commented 3 months ago

I can help!

nirga commented 3 months ago

Sounds good @lauradowell! @dtee1, are you still on this?

jonhsma commented 1 month ago

I'm interested! Is it still open

nirga commented 1 month ago

Yes @jonhsma!

jonhsma commented 1 month ago

On it! Will ping you on Slack if I have any questions!