open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.25k stars 1.4k forks source link

Tracking issue for `confmap.strictlyTypedInput` feature gate #10552

Closed mx-psi closed 2 weeks ago

mx-psi commented 2 months ago

This issue describes the confmap.strictlyTypedInput feature gate, intended to address #8565 and #9532.

What is changing?

  1. Configurations relying on the implicit type casting behaviors listed on #9532 will start to fail.
  2. Configurations using URI expansion (i.e. field: ${env:ENV}) for string-typed fields will use the value passed in ENV verbatim without intermediate type casting.

What does it mean to me?

Starting on v0.105.0, you may have to adapt your configuration to use the correct type on some fields. This should be a straightforward. You are able to skip the error and update your configuration at your own pace by disabling the confmap.strictlyTypedInput feature gate.

Please reply below if:

Changelog

Collector version Change PR
v0.103.0 Introduce feature gate as alpha #10400
v0.105.0 Promote feature gate to beta (enabled by default) #10554
v0.108.0 Promote feature gate to stable (can't be disabled) #10793
v0.110.0 Remove feature gate (errors out when referenced)
galecore commented 1 month ago

I observe a breaking change with this newly introduced behavior.

After the feature gate was turned on, grafana_cloud integrations via basic_auth extension with confmap/provider/envprovider now report errors like:

'client_auth.username' expected type 'string', got unconvertible type 'int', value: '12345'

As grafana cloud usernames are often integers, seems like users running this setup with docker without hardcoding credentials might experience unexpected errors.

One of the workarounds I see could be using bearertokenauth and providing token as ${env:GRAFANA_CLOUD_USERNAME}:${env:GRAFANA_CLOUD_PASSWORD}. Are there any other workarounds? Is there a correct way of providing integer-based values as strings via environment?

Here is a minimal example which allows reproducing the error:

docker-compose.yaml

version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    env_file:
      - .secrets

.secrets

GRAFANA_CLOUD_USERNAME="12345"
GRAFANA_CLOUD_PASSWORD="secret"
GRAFANA_CLOUD_OTLP_ENDPOINT="https://otlp-gateway-prod-eu-west-0.grafana.net/otlp"

otel-collector-config.yaml

extensions:
  basicauth/grafana_cloud:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/basicauthextension
    client_auth:
      username: "${env:GRAFANA_CLOUD_USERNAME}"
      password: "${env:GRAFANA_CLOUD_PASSWORD}"

receivers:
  otlp:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
  hostmetrics:
    # Optional. Host Metrics Receiver added as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver
    scrapers:
      load:
      memory:

processors:
  batch:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
  resourcedetection:
    # Enriches telemetry data with resource information from the host
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
    detectors: ["env", "system"]
    override: false
  transform/add_resource_attributes_as_metric_attributes:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
    error_mode: ignore
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
          - set(attributes["service.version"], resource.attributes["service.version"])

exporters:
  otlphttp/grafana_cloud:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
    endpoint: "${env:GRAFANA_CLOUD_OTLP_ENDPOINT}"
    auth:
      authenticator: basicauth/grafana_cloud

service:
  extensions: [basicauth/grafana_cloud]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [otlphttp/grafana_cloud]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [
        resourcedetection,
        transform/add_resource_attributes_as_metric_attributes,
        batch,
      ]
      exporters: [otlphttp/grafana_cloud]
    logs:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [otlphttp/grafana_cloud]
TylerHelmuth commented 1 month ago

@galecore you're likely experiencing the same issue as https://github.com/open-telemetry/opentelemetry-collector/issues/10659. I haven't been able to reproduce your use-case yet either.

galecore commented 1 month ago

Not sure if relevant, but would note that my setup for this is:

Operating System: Ubuntu 22.04.4 
Kernel: Linux 5.15.0-83-generic
Architecture: x86-64
Docker version: 24.0.7, build afdd53b
Docker Compose version: v2.20.2-desktop.1

An actual example that I'm running could be found here (just remove --feature-gates=-confmap.strictlyTypedInput) - https://github.com/galecore/telemetry-example/blob/master/examples/otel-collector/docker-compose.yml#L33

mx-psi commented 1 month ago

Is there a correct way of providing integer-based values as strings via environment?

10618 will fix this, in the mean time you can revert the feature gate

Kuckkuck commented 1 month ago

Btw. I get the same error twice within this section:

    transform/containerd:
        log_statements:
          context: log
          statements:
            - merge_maps(cache,ExtractPatterns(body,"^(?P<time>[^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^\\s]*) ?(?P<log>.*)$"), "upsert") where body != nil
            - merge_maps(cache,ExtractPatterns(body,"^(?P<time>\\d+/\\d+/\\d+\\s+\\d+:\\d+\\d+) (?P<log>.*)$"), "upsert") where attributes["log_name"]!= "MeshAccessLog" and cache["log"]!= nil and not IsMap(cache["log"])
            - set(body,cache["log"]) where cache["log"] != nil
            - merge_maps(cache,ParseJSON(body), "upsert") where IsMap(body)
            - set(body,cache["message"]) where cache["message"] != nil
            - set(body,cache["msg"]) where cache["msg"] != nil
            - set(severity_text,cache["level"]) where cache["level"] != nil
            - set(severity_text,cache["severity"]) where cache["severity"] != nil
            - set(severity_number,SEVERITY_NUMBER_INFO) where cache["level"] == "INFO"
            - set(severity_number,SEVERITY_NUMBER_INFO) where cache["severity"] == "info"
            - set(attributes["loggerName"],cache["loggerName"]) where cache["loggerName"] != nil
Error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'processors': error reading configuration for "transform/containerd": 1 error(s) decoding:

* 'log_statements': source data must be an array or slice, got map

Hint: Temporarily restore the previous behavior by disabling
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
2024/07/25 14:02:24 collector server run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'processors': error reading configuration for "transform/containerd": 1 error(s) decoding:

* 'log_statements': source data must be an array or slice, got map

Hint: Temporarily restore the previous behavior by disabling
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
TylerHelmuth commented 1 month ago

@Kuckkuck your configuration is wrong.

transform/containerd:
  log_statements:
    - context: log
      statements:
        - ...
mx-psi commented 1 month ago

Btw. I get the same error twice within this section:

This is not specific to this error, it happens for all errors: you get the error once in the logs and another in stdout.

cforce commented 1 month ago

same issue for https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstransformprocessor

* 'transforms': source data must be an array or slice, got map
[0;m
Hint: Temporarily restore the previous behavior by disabling 
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
2024/08/02 11:36:40 collector server run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:
[0;m
* error decoding 'processors': error reading configuration for "metricstransform": 1 error(s) decoding:
[0;m
* 'transforms': source data must be an array or slice, got map
[0;m
Hint: Temporarily restore the previous behavior by disabling 
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
mx-psi commented 1 month ago

@cforce transforms needs to be a list, not a single element, similar to the change Tyler suggested here: https://github.com/open-telemetry/opentelemetry-collector/issues/10552#issuecomment-2250522898

roobre commented 1 month ago

Just came across of what I believe to be a side-effect of this issue. On a config like this:

extensions:
  zpages: {}
  basicauth/otlp:
    client_auth:
      username: "${OTLP_USER}"
      password: "${GRAFANA_ACCESS_TOKEN}"

The collector will fail to start if OTLP_USER is numeric, e.g. OTLP_USER=12345, with the following error:

Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

error decoding 'extensions': error reading configuration for "basicauth/otlp": decoding failed due to the following error(s):

'client_auth.username' expected type 'string', got unconvertible type 'int', value: '12345'

Hint: Temporarily restore the previous behavior by disabling 
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
2024/08/03 12:29:46 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

error decoding 'extensions': error reading configuration for "basicauth/otlp": decoding failed due to the following error(s):

'client_auth.username' expected type 'string', got unconvertible type 'int', value: '12345'

Hint: Temporarily restore the previous behavior by disabling 
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
Stream closed EOF for monit/otelcol-5f57b9d79c-kbpc9 (otelcol)

Forcing this to a string in YAML as @TylerHelmuth suggested here ~does work~:

extensions:
  zpages: {}
  basicauth/otlp:
    client_auth:
      username: str!! "${OTLP_USER}" # Force YAML decoder to read it as a string.
      password: "${GRAFANA_ACCESS_TOKEN}"

EDIT: I don't think the above is working. I think it is taking str!! 1234 as a literal value, which will then fail to authenticate. str!! is not a yaml tag, !!str is.

Using username: !!str "${env:OTLP_USER}" does not seem to fix the problem, I get the same error:

'client_auth.username' expected type 'string', got unconvertible type 'int', value: '12345'

@TylerHelmuth Is there something I'm doing wrong here?

mx-psi commented 1 month ago

@roobre Thanks for the report. What version are you using? Would you mind trying with the latest version (v0.106.1) if you haven't done so already?

roobre commented 1 month ago

@mx-psi Thanks for taking a look! I'm on image: otel/opentelemetry-collector-contrib:0.106.1 The OTLP_USER env var is defined on a k8s secret and then mounted into the container running the image above.

TylerHelmuth commented 1 month ago

@roobre do you have then confmap.unifyEnvVarExpansion feature gate enabled or disabled? In 0.106.1 it is enabled by default so you'd have to be explicitly disabling it for it to be off.

roobre commented 1 month ago

@TylerHelmuth I don't think I'm setting that to any particular value, so I presume it should be set to whatever the default is. Am I expected to have to modify it for the snippet above to work as it used to?

TylerHelmuth commented 1 month ago

@roobre, @mx-psi found what the issue is.

the expandconverter is taking our struct type (that handles using the string representation of the int when the value is used in a string field), getting the original int value from it, and using that in the map instead.

The fix is the get rid of the expandconverter, which has been our plan all along. Keep an eye out for the v0.107.0 release next week which will hopefully include https://github.com/open-telemetry/opentelemetry-collector/pull/10508 and https://github.com/open-telemetry/opentelemetry-collector/pull/10510.

roobre commented 1 month ago

@TylerHelmuth That's awesome, thank you!

aheusser commented 3 weeks ago

I observe an expected type 'string', got unconvertible type 'float64' error with this newly-introduced feature gate. I haven't been able to figure out how to correct my configuration in order to make the error go away (other than using --feature-gates=-confmap.strictlyTypedInput to disable it).

I am declaring an environment variable in .env, and then passing it into various exporters for the tls.min_version.

I am getting the following error unless I:

Is there some other construction I should use to be able to supply the min_version via an env var without having to disable confmap.strictlyTypedInput?

Here is a minimal example which allows reproducing the error:

docker compose up --build output

Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):
otel-collector-1  |
otel-collector-1  | error decoding 'exporters': error reading configuration for "otlphttp/monitoring": decoding failed due to the following error(s):
otel-collector-1  |
otel-collector-1  | 'tls.min_version' expected type 'string', got unconvertible type 'float64', value: '1.2'
otel-collector-1  |
otel-collector-1  | Hint: Temporarily restore the previous behavior by disabling
otel-collector-1  |       the `confmap.strictlyTypedInput` feature gate. More details at:
otel-collector-1  |       https://github.com/open-telemetry/opentelemetry-collector/issues/10552

.env

TLS_MIN_VERSION=1.2

docker-compose.yaml

version: "2"
services:

  # Collector
  otel-collector:
    image: otel/opentelemetry-collector:0.106.1
    restart: always
    environment:
      - TLS_MIN_VERSION=${TLS_MIN_VERSION:-1.2}

    command: ["--config=/etc/otel-collector-config.yaml"]

    # must use "--feature-gates=-confmap.strictlyTypedInput" unless tls.min_version is a literal in otel-collector-config.yaml
    #command: ["--config=/etc/otel-collector-config.yaml", "--feature-gates=-confmap.strictlyTypedInput"]

    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml

otel-collector-config.yaml

receivers:
  otlp/tracing:
    protocols:
      grpc:
        endpoint: 0.0.0.0:8518

exporters:
  otlphttp/monitoring:
    endpoint: https://localhost:443
    tls:
      #min_version: ${TLS_MIN_VERSION}             # complains unless using "--feature-gates=-confmap.strictlyTypedInput"
      #min_version: ${env:TLS_MIN_VERSION}         # complains unless using "--feature-gates=-confmap.strictlyTypedInput"
      min_version: "${TLS_MIN_VERSION}"            # complains unless using "--feature-gates=-confmap.strictlyTypedInput"
      #min_version: "${env:TLS_MIN_VERSION}"       # complains unless using "--feature-gates=-confmap.strictlyTypedInput"  
      #min_version: '${TLS_MIN_VERSION}'           # complains unless using "--feature-gates=-confmap.strictlyTypedInput"
      #min_version: '${env:TLS_MIN_VERSION}'       # complains unless using "--feature-gates=-confmap.strictlyTypedInput"
      #min_version: "1.2"                          # works without complaint

service:
  pipelines:
    traces/kol:
      receivers: [otlp/tracing]
      processors: []
      exporters: [otlphttp/monitoring]
mx-psi commented 3 weeks ago

@aheusser Thanks for your comment! We are in the process of releasing v0.107.0, I will let you know once this is released (I think it should solve your issue)

aheusser commented 3 weeks ago

@aheusser Thanks for your comment! We are in the process of releasing v0.107.0, I will let you know once this is released (I think it should solve your issue)

Thank you @mx-psi, I will happily pull and try v107.0 and confirm ... when ready.

mx-psi commented 3 weeks ago

@aheusser v0.107.0 is out!

aheusser commented 3 weeks ago

@aheusser v0.107.0 is out!

@mx-psi I think my initial validation test a few hours ago (which seemed to suggest that v0.107.0 resolved my previously-reported problem) was an invalid test in some way.

After starting fresh again just now, I am now getting the same problem (got unconvertible type errors in log) ... except now the container is perpetually restarting as well (again, unless I use the --feature-gates=-confmap.strictlyTypedInput). Exact same steps as I outlined above (except now using image: otel/opentelemetry-collector:0.107.0, of course).

JaneCui commented 3 weeks ago

I have similar issue with my processor below.

experimental_metricsgeneration/ucp_internal_cpu_percent: rules:

mx-psi commented 3 weeks ago

After starting fresh again just now, I am now getting the same problem (got unconvertible type errors in log) ... except now the container is perpetually restarting as well (again, unless I use the --feature-gates=-confmap.strictlyTypedInput). Exact same steps as I outlined above (except now using image: otel/opentelemetry-collector:0.107.0, of course).

@aheusser Thanks for reaching out. I think I understand why this is happening. Our current test suite did not catch this, and we will likely need end to end tests. I filed #10897 to fix this and add one initial test that catches this.

I got error message "'rules[0].unit' expected type 'string', got unconvertible type 'int', value: '1'". For percentage, the recommended and default unit is 1. Should I change the unit or is this also a bug of confmap.strictlyTypedInput?

@JaneCui This is not a bug, you should put unit: "1" instead of unit: 1

aheusser commented 3 weeks ago

Thank you for your diligence @mx-psi I will test again once available.

roobre commented 3 weeks ago

@mx-psi I think I might be hitting this again in otel/opentelemetry-collector-contrib:0.107.0:

Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

error decoding 'extensions': error reading configuration for "basicauth/otlp": decoding failed due to the following error(s):

'client_auth.username' expected type 'string', got unconvertible type 'int', value: '12345'

Hint: Temporarily restore the previous behavior by disabling 
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
2024/08/16 18:39:23 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

error decoding 'extensions': error reading configuration for "basicauth/otlp": decoding failed due to the following error(s):

'client_auth.username' expected type 'string', got unconvertible type 'int', value: '12345'

Hint: Temporarily restore the previous behavior by disabling 
      the `confmap.strictlyTypedInput` feature gate. More details at:
      https://github.com/open-telemetry/opentelemetry-collector/issues/10552
Stream closed EOF for monit/otelcol-94d47bdd5-sbjck (otelcol)

I have tried with both

    extensions:
      zpages: {}
      basicauth/otlp:
        client_auth:
          username: !!str "${OTLP_USER}"
          password: "${GRAFANA_ACCESS_TOKEN}"

And

    extensions:
      zpages: {}
      basicauth/otlp:
        client_auth:
          username: "${OTLP_USER}"
          password: "${GRAFANA_ACCESS_TOKEN}"

And also with ${env:OTLP_USER} instead of ${OTLP_USER}.

Where again, the OTLP_USER env var contains 12345

dmitryax commented 3 weeks ago

@roobre you're right. It wasn't fully fixed. We got https://github.com/open-telemetry/opentelemetry-collector/pull/10897 merged, which should fix it for real :) It'll be released in 0.108.0

JaneCui commented 2 weeks ago

Thanks @mx-psi! The problem got solved after I change to unit: "1".

yanamg7 commented 5 days ago

I have a question for these changes. I'm upgrading the version from 0.97.0 to 0.108.0. I changed the env var values:

"failed to build pipelines: failed to create "resource/md" processor, in pipeline "logs/stream/ltm": error creating AttrProc. Either field "value", "from_attribute" or "from_context" setting must be specified for 0-th action collector server run finished with error: failed to build pipelines: failed to create "resource/md" processor, in pipeline "logs/stream/ltm": error creating AttrProc. Either field "value", "from_attribute" or "from_context" setting must be specified for 0-th action"

Do you know how I can fix it?

mx-psi commented 9 hours ago

@yanamg7 What version of the Collector are you using and how? What is your full config? (please remove any sensitive details before sharing!)