open-telemetry / opentelemetry-dotnet-contrib

This repository contains set of components extending functionality of the OpenTelemetry .NET SDK. Instrumentation libraries, exporters, and other components can find their home here.
https://opentelemetry.io
Apache License 2.0
475 stars 283 forks source link

[Http Instrumentation] Make it easier to scrub sensitive URI details from output #1791

Open stephenjust opened 3 years ago

stephenjust commented 3 years ago

Feature Request

Before opening a feature request against this repo, consider whether the feature should/could be implemented in the other OpenTelemetry client libraries. If so, please open an issue on opentelemetry-specification first.

Is your feature request related to a problem?

If so, provide a concise description of the problem.

The HTTP instrumentation package currently writes out the full request path for most requests, the exception being requests that contain Uri.UserInfo. This is not sufficient for many resource types. For example, the Azure Storage SDK's requests show up in logs with a SAS token appended.

For example: https://mystorage.blob.core.windows.net/container/myfile.txt?sv=2019-07-07&sr=c&sig=placeholder&se=2021-09-08T22%3A03%3A23Z&sp=r&api-version=2019-07-07&

I can also imagine other scenarios where the URI might otherwise contain sensitive information that cannot be logged.

Describe the solution you'd like:

What do you want to happen instead? What is the expected behavior?

It would be helpful if OpenTelemetry had a configurable high-performance URL scrubber for anywhere URLs could be logged, to make it as easy as possible for developers to sanitize their log output. This would support (minimally) blocking out any specified query string parameters.

Describe alternatives you've considered.

Which alternative solutions or features have you considered?

In our project today, we configure an enricher to run a Regex replace on all URLs we log, but it feels like there is room for improvement on the default behavior.

Additional Context

Add any other context about the feature request here.

This is definitely in the "nice to have" category. There are ways to solve the problem with the current shipped libraries.

reyang commented 3 years ago

@cijothomas I think this is not an instrumentation feature (e.g. it probably won't make sense for individual instrumentation libraries to do similar things). Is there a way to move this issue to the contrib repo?

dpk83 commented 3 years ago

@reyang @cijothomas This is an ask that we surely need to be able to use the instrumentation libraries. Here are a few options that would be good to consider (let's take an example of AspNetCore instrumentation library

reyang commented 3 years ago

Couple things to consider:

  1. scrubbing can be done at different places - instrumentation library, BaseProcessor<T> (for Activity and LogRecord), BaseExporter<T>, agent/collector, ingestion.
    • doing this on agent/collector/ingestion avoids the need to implement the same logic across all programming languages, and is probably better for enforcement, however it doesn't prevent credential / privacy data from leaving the process / local box.
    • doing this in SDK exporter makes it possible to use async processing, which won't block the hot-path (e.g. doing scrubbing in an instrumentation library or processor will slow down every Log/Activity call).
  2. scrubbing might introduce data problems if not done carefully.
    • if the data processed by a scrubber is used for metrics pre-agg, it might cause different points being aggregated to the wrong timeseries (unless the scrubber is doing an exact 1:1 mapping).
    • if the data size increased after processed by a scrubber, it might cause additional data loss due to truncation / limit.
pellared commented 3 years ago

Maybe it would be good to discuss under https://github.com/open-telemetry/opentelemetry-specification?

For security reasons the db.statement was made conditional. Maybe we could use a similar approach for http sem conv?

In general it should be possible to disable (or sanitize if possible) all attributes that can come from "user input" as they may contain sensitive data.

Kielek commented 6 months ago

Closing. Latest released brings OTEL_DOTNET_EXPERIMENTAL_HTTPCLIENT_DISABLE_URL_QUERY_REDACTION to redact all data. It can be disabled by mentioned environmental variable.

swythan commented 6 months ago

Closing. Latest released brings OTEL_DOTNET_EXPERIMENTAL_HTTPCLIENT_DISABLE_URL_QUERY_REDACTION to redact all data. It can be disabled by mentioned environmental variable.

That isn't equivalent.

  1. It defaults the other way. That setting prevents the reaction of all query parameters that was just defaulted to being enabled. (I'll leave the heated discussions to the other threads on that one.)
  2. What I want is to be able to redact only the sensitive query parameters. The original request says configurable so I don't think I'm alone in that.
Kielek commented 6 months ago

@swythan, reopening per your request.