influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Outputs.influxdb authorization failed messages after upgrade to v1.27 #13455

Closed jdmaloney closed 1 year ago

jdmaloney commented 1 year ago

Relevant telegraf.conf

[[outputs.influxdb]]
database = "asd"
insecure_skip_verify = false
password = "xxxxxxxxx"
skip_database_creation = true
urls = ["https://hostname.redacted:8086"]
username = "xxxxxxxxx"

Logs from Telegraf

2023-06-15T06:31:10.885805-05:00 hostname.redacted telegraf[2653546]: 2023-06-15T11:31:10Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 401 Unauthorized): authorization failed

System info

Telegraf v1.27.0 on RHEL 8.6 EUS

Docker

No response

Steps to reproduce

  1. Have a working [[outputs.influxdb]] config with Telegraf v1.26.3
  2. Upgrade to Telegraf v1.27.0 and restart service
  3. Observe Telegraf logs (and also notice metrics have stopped sending to InfluxDB

Expected behavior

Upgrading to Telegraf v1.27.0 won't break the ability to ship metrics to InfluxDB

Actual behavior

When performing the above steps we get 401: Authorization Failed messages and metrics are not sent to the database. Reverting back to v1.26.3 allows things to work again.

Additional info

No response

neelayu commented 1 year ago

by any chance your password contains $?

billglick commented 1 year ago

Yes, our password does include the $ character.

neelayu commented 1 year ago

Unfortunately there was an oversight in a recent change. You need to escape $ with $$ to make it work. Ref: #13432 Although another fix is coming. Sorry for the regression

powersj commented 1 year ago

Closing as a dup of #13432

billglick commented 1 year ago

I've confirmed this is indeed our issue. Escaping $ in the password entry with $$ allows it to work with telegraf 1.27.0.

powersj commented 1 year ago

@billglick thank you! would you be willing to try out the artifacts in #13451 also resolves the issue without the need to change your password?

billglick commented 1 year ago

@powersj I'm not sure if I'm testing this correctly, but it continues to be broken with a single $ when I upgrade to https://output.circle-artifacts.com/output/job/975829d1-9438-4c24-b717-310cbe55f3aa/artifacts/0/build/dist/telegraf-1.28.0~276472e8-0.x86_64.rpm

powersj commented 1 year ago

@billglick - Are you certain you are seeing the right version in the log output?

I started up influxdb 1.8, because this is on the shell I did escape the $:

docker run --tty --interactive --rm \
    --net host \
    --env INFLUXDB_HTTP_AUTH_ENABLED="true" \
    --env INFLUXDB_HTTP_FLUX_ENABLED="true" \
    --env INFLUXDB_DB="testing" \
    --env INFLUXDB_ADMIN_USER="admin" \
    --env INFLUXDB_ADMIN_PASSWORD="1.8my\$3cret" \
    --name influxdb-1.8 \
    influxdb:1.8

With the following config:

[[inputs.exec]]
  commands = [
    "echo metric,tag=host,location=0000 value=42 1686178228619000000",
  ]
  data_format = "influx_upstream"

[[outputs.influxdb]]
  database = "my-database"
  username = "admin"
  password = "1.8my$s3cret"
  urls = ["http://localhost:8086"]
❯ ./telegraf --config config.toml --once
2023-06-15T19:29:27Z I! Loading config: config.toml
2023-06-15T19:29:27Z I! Starting Telegraf 1.28.0-276472e8
2023-06-15T19:29:27Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-06-15T19:29:27Z I! Loaded inputs: exec
2023-06-15T19:29:27Z I! Loaded aggregators: 
2023-06-15T19:29:27Z I! Loaded processors: 
2023-06-15T19:29:27Z I! Loaded secretstores: 
2023-06-15T19:29:27Z I! Loaded outputs: influxdb
2023-06-15T19:29:27Z I! Tags enabled: host=ryzen
2023-06-15T19:29:27Z D! [agent] Initializing plugins
2023-06-15T19:29:27Z D! [agent] Connecting outputs
2023-06-15T19:29:27Z D! [agent] Attempting connection to [outputs.influxdb]
2023-06-15T19:29:27Z D! [agent] Successfully connected to outputs.influxdb
2023-06-15T19:29:27Z D! [agent] Starting service inputs
2023-06-15T19:29:27Z D! [agent] Stopping service inputs
2023-06-15T19:29:27Z D! [agent] Input channel closed
2023-06-15T19:29:27Z I! [agent] Hang on, flushing any cached metrics before shutdown
2023-06-15T19:29:27Z D! [outputs.influxdb] Wrote batch of 2 metrics in 5.369543ms
2023-06-15T19:29:27Z D! [outputs.influxdb] Buffer fullness: 0 / 10000 metrics
2023-06-15T19:29:27Z I! [agent] Stopping running outputs
2023-06-15T19:29:27Z D! [agent] Stopped Successfully
billglick commented 1 year ago

Are you certain you are seeing the right version in the log output?

Yes, I am sure I'm looking at the correct log output. I just tested this again...

# NO ERRORS AHEAD OF TIME WITH telegraf-1.26.3-1 
[root@testhost ~]# tail -f /var/log/messages | grep telegraf
^C

[root@testhost ~]# yum upgrade -y https://output.circle-artifacts.com/output/job/975829d1-9438-4c24-b717-310cbe55f3aa/artifacts/0/build/dist/telegraf-1.28.0~276472e8-0.x86_64.rpm --disableexcludes=all
Updating Subscription Management repositories.
Last metadata expiration check: 0:53:40 ago on Thu 15 Jun 2023 01:40:30 PM CDT.
telegraf-1.28.0~276472e8-0.x86_64.rpm                                                                                                 39 MB/s |  49 MB     00:01    
Dependencies resolved.
=====================================================================================================================================================================
 Package                                Architecture                         Version                                Repository                                  Size
=====================================================================================================================================================================
Upgrading:
 telegraf                               x86_64                               1.28.0-0                               @commandline                                49 M

Transaction Summary
=====================================================================================================================================================================
Upgrade  1 Package
...
  Cleanup          : telegraf-1.26.3-1.x86_64                                                                                                                    2/2 
  Running scriptlet: telegraf-1.26.3-1.x86_64                                                                                                                    2/2 
  Running scriptlet: telegraf-1.28.0-0.x86_64                                                                                                                    2/2 
  Running scriptlet: telegraf-1.26.3-1.x86_64                                                                                                                    2/2 
  Verifying        : telegraf-1.28.0-0.x86_64                                                                                                                    1/2 
  Verifying        : telegraf-1.26.3-1.x86_64                                                                                                                    2/2 
Installed products updated.

Upgraded:
  telegraf-1.28.0-0.x86_64                                                                                                                                           

Complete!

# AUTH ERRORS AFTERWARDS WITH telegraf-1.28.0-0 ARTIFACTS BUILD
[root@testhost ~]# tail -f /var/log/messages | grep telegraf
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded aggregators:
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded processors:
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded secretstores:
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded outputs: influxdb (2x)
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Tags enabled: host=testhost.local
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z W! Deprecated inputs: 1 and 0 options
2023-06-15T14:34:21.290816-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"testhost.local", Flush Interval:10s
2023-06-15T14:35:01.266940-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:35:01Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 401 Unauthorized): authorization failed
2023-06-15T14:35:08.966393-05:00 testhost.local telegraf[2677191]: 2023-06-15T19:35:08Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 401 Unauthorized): authorization failed
^C

[root@testhost ~]# systemctl status telegraf
● telegraf.service - Telegraf
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-06-15 14:34:21 CDT; 1min 10s ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 2677191 (telegraf)
    Tasks: 12 (limit: 100619)
   Memory: 46.0M
   CGroup: /system.slice/telegraf.service
           └─2677191 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded aggregators:
Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded processors:
Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded secretstores:
Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Loaded outputs: influxdb (2x)
Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! Tags enabled: host=testhost.local
Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z W! Deprecated inputs: 1 and 0 options
Jun 15 14:34:21 testhost.local telegraf[2677191]: 2023-06-15T19:34:21Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"testhost.local>
Jun 15 14:34:21 testhost.local systemd[1]: Started Telegraf.
Jun 15 14:35:01 testhost.local telegraf[2677191]: 2023-06-15T19:35:01Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropp>
Jun 15 14:35:08 testhost.local telegraf[2677191]: 2023-06-15T19:35:08Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropp>
powersj commented 1 year ago

Did you remove the double $$ you used earlier?

billglick commented 1 year ago

Did you remove the double $$ you used earlier?

Yes. It is now using the original single $.

powersj commented 1 year ago

Can you adapt the above example I provided to help me reproduce this? Is it something with the password escaping still? if so can you give a fake example that reproduces it using the steps above?

Thanks

billglick commented 1 year ago

I can not quickly/easily adapt @powersj's example above.

But I can confirm that not escaping the $ in the Influx password has the following results:

Escaping the $ in the Influx password as $$ has the following results:

powersj commented 1 year ago

@billglick,

I am ready to release v1.27.1, but I also want to ensure we did not miss a scenario given your auth failures. However, to do so we are going to need a way to reproduce your issue or understand how your config is different given the example above worked as expected.

billglick commented 1 year ago

@powersj There are 3 other thing that seems significantly different than your telegraf conf example, but I have no idea if any of them are related:

  1. Our InfluxDB password also includes a ^ character. I doubt that is related, but mentioning it just in case.

  2. And then as @jdmaloney mentioned initially, we have the following additional [[outputs.influxdb]] options enabled:

    insecure_skip_verify = false
    skip_database_creation = true
  3. We have 2 different [[outputs.influxdb]] configured that allow us to send the same data to 2 different InfluxDB servers. These are configured absolutely identical to each other, except for different urls for each.

neelayu commented 1 year ago

@powersj I tried your example in the newer build, but I am also getting auth failed error.

powersj commented 1 year ago

@neelayu - I need logs :) I re-ran this, this morning and I too see a failure now :( Did you see a warning print out as well?

WARN[0000] The "s3cret" variable is not set. Defaulting to a blank string.

@srebhan another test case for you

srebhan commented 1 year ago

Please try escaping the $ with \$ to make it work.

billglick commented 1 year ago

@srebhan - Here are my results with escaping the $ in the [[outputs.influxdb]] password with \$:

neelayu commented 1 year ago

@billglick this is the intended behaviour. To elaborate- if we want to support shell expansion syntax, then we will have to escape $. Until 1.26, this feature was very limited. If the env var was not found, it used the variable as is. In 1.27, the escape syntax was supposed to be $$. But it caused some issues. In 1.28, you'll need to escape using backslash. \ Just like we do it in any other shell scripts.

srebhan commented 1 year ago

@billglick thank you for testing! This is really appreciated to prevent further inconvenience... :-(