influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.59k stars 5.56k forks source link

need escape character in the mesage for output.loki #15274

Closed elixedus closed 5 months ago

elixedus commented 5 months ago

https://github.com/influxdata/telegraf/blob/10c15ce41695f6195fa483943fc3abd5b4138afd/plugins/outputs/loki/loki.go#L133

not sure if the problem happens exactly here, but when i try to bridge input.docker_log to output.loki, it seems loki plugin does not escape . and cause the error:

Error writing to outputs.loki: when writing to [http://localhost:3100/loki/api/v1/push] received status code, 400: 1:189: parse error: unexpected character inside braces: '.'
powersj commented 5 months ago

Thanks - do you have a link to a regex of characters that are valid or invalid? I'm sure there is more than just a period.

elixedus commented 5 months ago

using the file plugin and with the message received status code, 400: 1:189: parse error: unexpected character inside braces: '.', i get these few lines at the end of head -n 191:

docker_log,container_image=docker.io/selenium/standalone-chrome,container_name=selenium-chrome-server,container_version=latest,host=testmachine,org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04,stream=stdout container_id="d9c63ceb084c3677616fbbf7e30f521ca7030d00156f9d5e860198b9da85ae90",message="00:34:08.407 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 9935649423d2f6ffcd3fde573509f9f0" 1714523648000000000
docker_log,container_image=docker.io/selenium/standalone-chrome,container_name=selenium-chrome-server,container_version=latest,host=testmachine,org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04,stream=stdout container_id="d9c63ceb084c3677616fbbf7e30f521ca7030d00156f9d5e860198b9da85ae90",message=" Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome-headless-shell, browserVersion: 122.0.6261.69, chrome: {chromedriverVersion: 122.0.6261.69 (81bc525b6a36..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:38817}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://10.98.32.12:4444/sessi..., se:cdpVersion: 122.0.6261.69, se:vnc: ws://10.98.32.12:4444/sessi..., se:vncEnabled: true, se:vncLocalAddress: ws://10.98.32.12:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}" 1714523648000000000
docker_log,container_image=docker.io/selenium/standalone-chrome,container_name=selenium-chrome-server,container_version=latest,host=testmachine,org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04,stream=stdout container_id="d9c63ceb084c3677616fbbf7e30f521ca7030d00156f9d5e860198b9da85ae90",message="00:34:08.793 WARN [SeleniumSpanExporter$1.lambda$export$3] - {\"traceId\": \"648b1c57c4d4905dd1586f08abc3017b\",\"eventTime\": 1714523648792981307,\"eventName\": \"HTTP request execution complete\",\"attributes\": {\"http.flavor\": 1,\"http.handler_class\": \"org.openqa.selenium.remote.http.Route$PredicatedRoute\",\"http.host\": \"127.0.0.1:4444\",\"http.method\": \"POST\",\"http.request_content_length\": \"50\",\"http.scheme\": \"HTTP\",\"http.status_code\": 500,\"http.target\": \"\\u002fsession\\u002f9935649423d2f6ffcd3fde573509f9f0\\u002furl\",\"http.user_agent\": \"selenium\\u002f4.7.2 (python linux)\"}}" 1714523648000000000
docker_log,container_image=docker.io/selenium/standalone-chrome,container_name=selenium-chrome-server,container_version=latest,host=testmachine,org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04,stream=stdout container_id="d9c63ceb084c3677616fbbf7e30f521ca7030d00156f9d5e860198b9da85ae90",message="" 1714523648000000000
docker_log,container_image=docker.io/selenium/standalone-chrome,container_name=selenium-chrome-server,container_version=latest,host=testmachine,org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04,stream=stdout message="00:34:08.987 INFO [LocalDistributor.newSession] - Session request received by the Distributor:",container_id="d9c63ceb084c3677616fbbf7e30f521ca7030d00156f9d5e860198b9da85ae90" 1714523648000000000
docker_log,container_image=docker.io/selenium/standalone-chrome,container_name=selenium-chrome-server,container_version=latest,host=testmachine,org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04,stream=stdout container_id="d9c63ceb084c3677616fbbf7e30f521ca7030d00156f9d5e860198b9da85ae90",message=" [Capabilities {browserName: chrome, goog:chromeOptions: {args: [--headless, --disable-extensions, --disable-logging, --no-sandbox, --disable-dev-shm-usage, --disable-infobars, --start-maximized, --lang=en, --disable-blink-features=Au...], excludeSwitches: [enable-automation], extensions: [], useAutomationExtension: false}, pageLoadStrategy: normal}]" 1714523648000000000

but im not sure whether these are exactly the message that produces the problem

elixedus commented 5 months ago

i think the log content doesnt matter.

It is reproducible through

podman run -it --rm --name=test ubuntu bash

then press an enter to trigger a log entry in podman; with the followings in telegraf.conf:

[[inputs.docker_log]]
    endpoint = "unix:///mnt/host/run/user/1020/podman/podman.sock"
    container_name_include = ["test"]

where im running with podman version 3.4.4

notably, similar configuration for [[inputs.docker]] works with podman.sock

powersj commented 5 months ago

In 30mins from this message there will be artifacts available as a comment in https://github.com/influxdata/telegraf/pull/15277, please download and try one of those artifacts, and be sure to add sanitize_label_names = true to your loki output config.

i think the log content doesnt matter.

What I am after is understanding two things:

1) Exactly what message is causing the issue. Ideally, I could get from your is a line protocol message like you previous showed that demonstrates the issue. 2) Assuming this is due to an invalid character, what are the other characters that cause the issue

Looking at your previous messages that you did provide it appears to be due to these tag keys:

org.opencontainers.image.ref.name=ubuntu,org.opencontainers.image.version=22.04

If I update those tag keys to remove the decimals all the messages said sucessfully:

org_opencontainers_image_ref_name=ubuntu,org_opencontainers_image_version=22.04

2024-05-02T12:49:41Z D! [outputs.file] Wrote batch of 6 metrics in 55.95µs 2024-05-02T12:49:41Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics

Looking at the docs around label naming there does appear to be a regex:

[a-zA-Z:][a-zA-Z0-9:]*

Namely we can start with a letter, underscore, or colon. Then after that we can have numbers as well.

I've put up https://github.com/influxdata/telegraf/pull/15277 with a new option to do this sanitization for you, but you need to enable it with sanitize_label_names = true

elixedus commented 5 months ago

it works like charm, thanks