microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
365 stars 29 forks source link

Open Telemetry support #768

Open torosent opened 1 year ago

torosent commented 1 year ago

(4/8/2024 Update) A managed Open telemetry agent is now in public preview. Docs: https://learn.microsoft.com/en-us/azure/container-apps/opentelemetry-agents?tabs=arm

mlouage commented 1 year ago

Just wondering, we have Dapr with the observability block and we can add the Otel libraries to our own code. What would be the use case for automatic OpenTelemetry support in ACA?

passarela commented 1 year ago

I would also like more details on how integration with OpenTelemetry would be!

eklipse2k8 commented 11 months ago

I would love if my container could just feed spans and logs into something.

maskati commented 6 months ago

@torosent i assume this is the roadmap item for the currently in preview OpenTelemetry agent? If so then a feature suggestion: configuration and routing is currently static at the environment level. This prevents routing telemetry to per-app destinations, for example per-app Application Insights resources. Is there any possibility to extend the current capabilities with per-app routing?

SophCarp commented 6 months ago

@torosent i assume this is the roadmap item for the currently in preview OpenTelemetry agent? If so then a feature suggestion: configuration and routing is currently static at the environment level. This prevents routing telemetry to per-app destinations, for example per-app Application Insights resources. Is there any possibility to extend the current capabilities with per-app routing?

@maskati Yes, this is the roadmap item for the new OpenTelemetry agent that's in preview. Thaks for the feedback! We'll look into per-app routing. Are there any specific scenarios you're thinking about where this would help?

maskati commented 6 months ago

Are there any specific scenarios you're thinking about where this would help?

@SophCarp The typical scenario is an environment hosting a set of Otel instrumented services and a combination of:

So for example the telemetry of containerapp1 would go to appinsights1 and datadog and the telemetry of containerapp2 would go to appinsights2 and datadog.

luddskunk commented 6 months ago

Hi! Happy to see this feature coming along.

Do I understand the discussions here that I can only set one OpenTelemetry configuration for my whole ACA Environment?

If following this ACA Landing Zone architect-proposal, where I want to have 1 ACA environment : N ACA Containers, it means I must send all data to the same store i.e. appinsights1?

EDIT: If that's the case, I would also like to +1 on the above - per app would be really helpful!

SophCarp commented 6 months ago

Hi @maskati and @luddskunk thanks for the helpful feedback!

Currently you can split destination by type of data - i.e if I wanted to send metrics to datadog and traces to app insights, I could do that. However, you are correct that you wouldn't be able to split by app. All the metrics from app1 and app2 would go to datadog, for example.

We want to build this feature to provide a simplified experience for users to easily take advantage of Open Telemetry. For scenarios our managed agent doesn't support (yet), users can also host and manage their own OTel collector. Martin Thwaites wrote an article about setting up a container apps OTel collector: https://www.honeycomb.io/blog/opentelemetry-collector-azure-container-apps

quality-leftovers commented 6 months ago

Great to see this moving forward. Can you share any information on when metrics support for Application Insights will arrive ("The Application Insights endpoint doesn't accept metrics.").

SophCarp commented 6 months ago

@quality-leftovers we are working to include this capability, but no ETA as of now. Do you have a specific scenario that requires sending metrics through OTel to App insights?

luddskunk commented 6 months ago

Hi @SophCarp,

Can I read more about the details of what agent is provided with the ACA Environment?

quality-leftovers commented 6 months ago

@Sop

@quality-leftovers we are working to include this capability, but no ETA as of now. Do you have a specific scenario that requires sending metrics through OTel to App insights?

Bascially we have an 3rd party application we are hosting that offers an OTel metrics endpoint and we'd like to have the metrics in Azure Monitor. My understanding was that metrics support in application insights endpoint would be a prerequisite

luddskunk commented 5 months ago

Hello,

I am trying to use the ACA environment agent and apply an OTEL endpoint like so:

az containerapp env telemetry otlp add \
--name "my-aca-env" \
--otlp-name "otlp-endpoint" \
--resource-group "rg-azurecontainerapps" \
--endpoint "https://my-otlp-url/v1/traces" \
--insecure false \
--headers "Authorization: Basic mysecrettoken" \
--enable-open-telemetry-traces true \
--enable-open-telemetry-metrics false \
--enable-open-telemetry-logs false

This brings me the output

[
  {
    "enableOpenTelemetryLogs": false,
    "enableOpenTelemetryMetrics": false,
    "enableOpenTelemetryTraces": true,
    "endpoint": "https://my-otlp-url/v1/traces",
    "headers": [
      {
        "key": "Authorization",
        "value": null
      }
    ],
    "insecure": false,
    "name": "otlp-endpoint"
  }
]

Problem:

It does not work and send anything to ElasticSearch/Grafana and my best guess is that it's because according to the documentation:

image

Image Source

Does this mean I must somehow escape the space in Authorization: Basic mysecrettoken? Getting the feedback "value": null also gives me an indication something is wrong.

I reached out to my local Microsoft team and they pointed me here. Any ideas @SophCarp?

A final more minor thing is that the docs have incorrect documentation on Azure CLI sections on code snippets.

SophCarp commented 5 months ago

Hi @luddskunk thanks for reaching out and trying our new preview feature!

"value": null is what our API returns when the value is a sensitive token that needs to be protected. It doesn't necessarily mean there is an issue, but just in case, we can try the format that has worked with another customer.

  1. Instead of headers "headers": "Authoration: Basic mysecrettoken", try "headers": "Authorization=Basic mysecrettoken".
  2. Is the "mysecrettoken the <base64 instanceID:token> token? I'm referring to this grafana documentation: https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/#push-directly-from-applications-using-the-opentelemetry-sdks
  3. Instead of "endpoint": "https://my-otlp-url/v1/traces" try "endpoint": "https://my-otlp-url". Since the endpoint theoretically could be used for metrics, traces, or logs, the "v1/traces" is added to the base endpoint by the agent when it's specifically sending traces.

So, overall, could you try:

az containerapp env telemetry otlp add \
--name "my-aca-env" \
--otlp-name "otlp-endpoint" \
--resource-group "rg-azurecontainerapps" \
--endpoint "https://my-otlp-url" \
--insecure false \
--headers "Authorization=Basic mysecrettoken" \
--enable-open-telemetry-traces true \
--enable-open-telemetry-metrics false \
--enable-open-telemetry-logs false

Let me know if that works!

And thanks for letting me know about the code snippets- could you share which ones specifically you found incorrect?

SophCarp commented 5 months ago

Hi @SophCarp,

Can I read more about the details of what agent is provided with the ACA Environment?

Hi @luddskunk I must have missed this question, thanks for your patience! The managed Open Telemetry agent ACA is currently based on the open source Open Telemetry collector: https://opentelemetry.io/docs/collector/. We manage it for the customer so they can take advantage of some key use cases without having to maintain and run it on their own.

Are there any specific details you're interested in?

luddskunk commented 5 months ago

Hi @luddskunk thanks for reaching out and trying our new preview feature!

"value": null is what our API returns when the value is a sensitive token that needs to be protected. It doesn't necessarily mean there is an issue, but just in case, we can try the format that has worked with another customer.

  1. Instead of headers "headers": "Authoration: Basic mysecrettoken", try "headers": "Authorization=Basic mysecrettoken".
  2. Is the "mysecrettoken the <base64 instanceID:token> token? I'm referring to this grafana documentation: https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/#push-directly-from-applications-using-the-opentelemetry-sdks
  3. Instead of "endpoint": "https://my-otlp-url/v1/traces" try "endpoint": "https://my-otlp-url". Since the endpoint theoretically could be used for metrics, traces, or logs, the "v1/traces" is added to the base endpoint by the agent when it's specifically sending traces.

So, overall, could you try:

az containerapp env telemetry otlp add \
--name "my-aca-env" \
--otlp-name "otlp-endpoint" \
--resource-group "rg-azurecontainerapps" \
--endpoint "https://my-otlp-url" \
--insecure false \
--headers "Authorization=Basic mysecrettoken" \
--enable-open-telemetry-traces true \
--enable-open-telemetry-metrics false \
--enable-open-telemetry-logs false

Let me know if that works!

And thanks for letting me know about the code snippets- could you share which ones specifically you found incorrect?

Hi!

Thanks for the thorough response.

  1. Yes, I tried setting equal sign and get the expected null value.
  2. Yes, I can successfully use the token from grafana and also elastic works from inside of my aca container to my destination.
  3. That made sense based on the boolean options available!

I tried what you recommended me but still it does not seem to work for me. Any ideas on how to move forward? Do you have a working example of ACA environment sending to a OTLP endpoint?

ProgrammerAL commented 4 months ago

Is there a way we can view what the OTel Agent is processing? Maybe if the agent is erroring while sending data to my OTel endpoint? My OTel data is not making it to my configured endpoint (Honeycomb) and I can't figure out why. After triple checking, I think I have everything set correctly.

For a while I thought maybe the header wasn't set, but found the above message saying that's just how the API returns the value. By the way, something to tell the end user that a value is set but it can't be shown would be really helpful there. I lost a few hours thinking the value wasn't being set.

SophCarp commented 4 months ago

@luddskunk

I tried what you recommended me but still it does not seem to work for me. Any ideas on how to move forward? Do you have a working example of ACA environment sending to a OTLP endpoint?

Please submit a support request: https://ms.portal.azure.com/#view/Microsoft_Azure_Support/HelpAndSupportBlade/~/overview

@ProgrammerAL

My OTel data is not making it to my configured endpoint (Honeycomb) and I can't figure out why. After triple checking, I think I have everything set correctly.

When setting your API key within Honeycomb, did you set the API key to x-honeycomb-team? https://docs.honeycomb.io/send-data/opentelemetry/#using-the-honeycomb-opentelemetry-endpoint

This is the process I took to create a honeycomb endpoint:

  1. Make a HoneyComb account: https://www.honeycomb.io/

  2. Set up a key in Honeycomb: Make an Ingestion API key with name "x-honeycomb-team" Copy the API Key ID

  3. Set up the variables

    OTLP_1="honeycomb"
    OTLP_ENDPOINT_1="api.honeycomb.io:443"
    HONEY_API_KEY_ID="<YOUR_HONEYCOMB_KEY_ID>"
    OTLP_HEADERS_1="x-honeycomb-team=$HONEY_API_KEY_ID"
  4. Add honeycomb as an OTEL destination Adds a honeycomb OTLP endpoint and pipes metrics, traces, and logs to it

    az containerapp env telemetry otlp add  \
    --name $ENVIRONMENT \
    --resource-group $RESOURCE_GROUP \
    --endpoint $OTLP_ENDPOINT_1 \
    --otlp-name $OTLP_1 \
    --insecure false \
    --headers $OTLP_HEADERS_1 \
    --enable-open-telemetry-metrics true \
    --enable-open-telemetry-traces true \
    --enable-open-telemetry-logs true \
  5. Sanity Check: Check out one specific otlp endpoint details:

    az containerapp env telemetry otlp show \
    --name $ENVIRONMENT \
    --resource-group $RESOURCE_GROUP \
    --otlp-name $OTLP_1

    expected payload:

    {
    "otlpConfiguration": {
    "enable-open-telemetry-logs": true,
    "enable-open-telemetry-metrics": true,
    "enable-open-telemetry-traces": true,
    "endpoint": "api.honeycomb.io:443",
    "headers": [
      {
        "key": "x-honeycomb-team",
        "value": null
      }
    ],
    "insecure": false,
    "name": "honeycomb"
    }
    }
ProgrammerAL commented 4 months ago

@SophCarp - Thanks for the help! Turns out I set the wrong endpoint. I had it set to https://api.honeycomb.io, and when I changed it to api.honeycomb.io:443 like in your post above, the telemetry started showing up in Honeycomb.

Not sure if this is on the roadmap or not, but we need a way to debug this. Maybe some way to get error messages, or basic output, of the underlying OTel agent. And/or the data being pushed to the OTel endpoint.

Unrelated to the above, setting the flag insecure to false in order to enable a feature feels backwards. I kept getting that double-negative backwards in my head and thinking I set the property wrong.

SophCarp commented 4 months ago

@ProgrammerAL I'm glad that worked!

Thanks for your feedback, I'll follow up on seeing what we might be able to surface to help with debugging.

As for the insecure flag, I believe it was based off of OpenTelemetry's language:

Insecure: Whether to enable client transport security for the exporter’s gRPC connection. This option only applies to OTLP/gRPC when an endpoint is provided without the http or https scheme - OTLP/HTTP always uses the scheme provided for the endpoint. Implementations MAY choose to not implement the insecure option if it is not required or supported by the underlying gRPC client implementation.

Default: false Env vars: OTEL_EXPORTER_OTLP_INSECURE OTEL_EXPORTER_OTLP_TRACES_INSECURE OTEL_EXPORTER_OTLP_METRICS_INSECURE OTEL_EXPORTER_OTLP_LOGS_INSECURE [2]

Source: https://opentelemetry.io/docs/specs/otel/protocol/exporter/#:~:text=Insecure%3A%20Whether%20to,OTEL_EXPORTER_OTLP_LOGS_INSECURE%20%5B2%5D

We were trying to choose between a more straightforward flag or a flag that's in line with current OpenTelemetry language. It's not a perfect decision, but I promise it wasn't done for no reason! 😁

KarstenWintermann commented 3 months ago

Hi, I am currently trying to get the OpenTelemetry preview to work in our environment, but the OpenTelemetry collector doesn't forward my data to the specified endpoint. In my effort to narrow it down as much as possible, I configured an endpoint that I know would get logged by the firewall and I sent a trace message using the grpcurl tool as described here.

This is the setup that I have configured:

az containerapp env telemetry otlp list --name <name> --resource-group <resourcegroup>
Command group 'containerapp env telemetry otlp' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
[
  {
    "enableOpenTelemetryLogs": true,
    "enableOpenTelemetryMetrics": true,
    "enableOpenTelemetryTraces": true,
    "endpoint": "www.google.de:443",
    "headers": null,
    "insecure": true,
    "name": "AKS"
  }
]

The response that I get from the grpcurl tool looks good to me:

<user>@<container>:/$ grpcurl -plaintext -v -d @ -proto opentelemetry/proto/collector/trace/v1/trace_service.proto -import-path . k8se-otel.k8se-apps.svc:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export < test.json          

Resolved method descriptor:
// For performance reasons, it is recommended to keep this RPC
// alive for the entire life of the application.
rpc Export ( .opentelemetry.proto.collector.trace.v1.ExportTraceServiceRequest ) returns ( .opentelemetry.proto.collector.trace.v1.ExportTraceServiceResponse );

Request metadata to send:
(empty)

Response headers received:
content-type: application/grpc

Response contents:
{
  "partialSuccess": {}
}

Response trailers received:
(empty)
Sent 1 request and received 1 response

The problem is that nothing seems to get sent out by the OpenTelemetry collector.

Is there a way to see the log output from the collector, or maybe the configuration file?

kiwiinlondon commented 3 months ago

It would be great to get log support for datadog. We got tracing working really easily but alas need logs as well

rbange commented 2 months ago

The command az containerapp env telemetry app-insights show always returns null for the connection string. It doesn't matter if I set this via the cli or via the portal.

$ az containerapp env telemetry app-insights show -n xxx -g xxx
Command group 'containerapp env telemetry app-insights' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
{
  "connectionString": null,
  "enableOpenTelemetryLogs": true,
  "enableOpenTelemetryTraces": true
}
rbange commented 2 months ago

Anyone got the app-insights otel integration working? I have a local setup with jaeger which works absolutely fine, but in the deployed version, nothing reaches app insights. It is nearly impossible to debug this.

KarstenWintermann commented 2 months ago

@rbange maybe you need to update your Azure CLI? Mine is version 2.61.0 with telemetry support 1.1.0. There it is called "endpoint", not "connectionString". I got it working by specifying the endpoint as "dns-name:4317" (only gRPC works, not HTTP). Also, the DNS name has to be from a public DNS zone, private doesn't seem to work either.

rbange commented 2 months ago

@KarstenWintermann thanks for the tip, but I am actually on an higher version.

$ az version
{
  "azure-cli": "2.63.0",
  "azure-cli-core": "2.63.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {
    "account": "0.2.5",
    "alertsmanagement": "0.2.3",
    "azure-devops": "1.0.1",
    "bastion": "1.1.0",
    "containerapp": "0.3.54",
    "ml": "2.29.0",
    "scheduled-query": "1.0.0b1",
    "serviceconnector-passwordless": "2.0.7",
    "ssh": "2.0.5",
    "storage-preview": "1.0.0b2"
  }
}

I think you are referring to the actual custom Otel implementation right? There it should have the endpoint field.

I want to use app insights, but via the Otel collector of container apps and not via the dedicated library: https://pypi.org/project/azure-monitor-opentelemetry/. This would allow me to use other collectors and frameworks in other environments (like locally) without app insights.

devnev commented 2 months ago

I got it working, but with some issues:

I initially set it up using the azure web portal, going to the "OTel endpoints" tab, and clicking through the UI to create an Application Insights instance backed by an existing Logs Workspace. After restart, the containers have the envs

CONTAINERAPP_OTEL_LOGGING_GRPC_ENDPOINT=http://k8se-otel.k8se-apps.svc:4317/v1/logs
CONTAINERAPP_OTEL_METRIC_GRPC_ENDPOINT=http://k8se-otel.k8se-apps.svc:4317/v1/metrics
CONTAINERAPP_OTEL_TRACING_GRPC_ENDPOINT=http://k8se-otel.k8se-apps.svc:4317/v1/traces
OTEL_EXPORTER_OTLP_ENDPOINT=http://k8se-otel.k8se-apps.svc:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_RESOURCE_ATTRIBUTES=[redacted]
OTEL_SERVICE_NAME=[redacted]

While the OTEL-prefixed environment variables seem to be valid, The CONTAINERAPP-prefixed environment variables seem to be invalid, as during testing these URLs reported "unkonwn method" grpc errors. With this initial setup, traces appeared in the "performance" tab of the Application Insights.

I then tried to recreate this using deployment automation - in my case, pulumi, using the azure-native package, which uses the Azure REST API directly. With this I was able to fully recreate an equivalent Application Insights resource, in that the output of az resource show shows equivalent properties for both the UI-created application insights, and the REST-API-created application insights instance.

However, if I update the container app environment to use the new application insights object, either via the Azure web portal or via pulumi / REST (using the v20240202preview version of the app API), traces do not appear in either application insights instance. I can switch back and forth, and when using the initially-created insights instance, traces appear, but on the second instance, they don't. I've now created another instance manually through the web portal, and switching to that one also doesn't work, in that traces don't appear. This leads me to believe that the managed collector is not updating properly somehow.

@rbange regarding the null connection string in the output, I read somewhere that the azure cli redacts values potentially containing secrets by replacing them with null. I get a null there regardless of whether it is working or not.

devnev commented 2 months ago

Additionally, the web portal UI to select an Application Insights instance as the OTel endpoint only lets you select Application Insight instances in the same resource group as the Container App Environment. The documentation doesn't mention such a restriction, and the REST APIs let you set a connection string to an Application Insights instance in a different resource group, but I haven't been able to confirm if cross-resource-group connections actually work or not due to the issue mentioned in my previous comment.