influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.49k stars 5.55k forks source link

[inputs.ipmi_sensor] Sudo with custom PAM #10056

Open SeanPMiller opened 2 years ago

SeanPMiller commented 2 years ago

Feature Request

We use a nonhuman user named telegraf to run telegraf. To authenticate nonhuman users, we use a custom PAM for sudo. We configure the ipmi_sensor input plugin with use_sudo=true, and we configure sudoers carefully, so that user telegraf can execute the ipmi command.

However, the use_sudo configuration option causes the -n option to be passed to sudo as shown below:

https://github.com/influxdata/telegraf/blob/38aefd99b55450a6338c3e843487712110c2f3d2/plugins/inputs/ipmi_sensor/ipmi.go#L164-L168

When given this option, and the last-run timestamp for sudo is old, then sudo exits without calling into the PAM:

https://github.com/sudo-project/sudo/blob/bb5843055ef0c3b84254769926f07cc9d668288b/plugins/sudoers/check.c#L128-L133

Consequently, the input plugin cannot work.

Proposal:

Add a new configuration field to the input plugin that skips adding -n for our use case. Presumably, the plugin would need to change how it deals with the sudo command outcome.

Current behavior:

Plugin does not work.

Desired behavior:

Plugin works.

Use case:

See above. Sorry, not used to writing issues. Let me know if you need any more information.

powersj commented 2 years ago

Hi,

My understanding of -n/--non-interactive is it avoids creating a prompt and if one is necessary it errors out. There are about a dozen input plugins that use this same option so I would like to ensure any change is actually necessary.

You said that the last-run timestamp for sudo is old. 1) How did you determine this? 2) How did you know this is the issue?

Can you also provide the actual error message you get please?

SeanPMiller commented 2 years ago

Thanks, Joshua. The error that we observed follows:

Nov 02 16:22:08 fqdn.removed.com telegrafctl[38108]: 2021-11-02T16:22:08Z E! [inputs.ipmi_sensor] Error in plugin: failed to run command sudo -n /usr/bin/ipmitool sdr elist:...is required

Nov 02 16:22:34 fqdn.removed.com sudo[11269]: telegraf : a password is required ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/ipmitool sdr elist

The most convincing piece of evidence is that we resolved this issue by simply adding a cronjob on every server that executes sudo id as user telegraf every four minutes, while we execute the input plugin via the Telegraf daemon every five minutes. The cronjob updates the sudo timestamp so that the -n check passes during the plugin execution.

We're using a custom PAM, so bypassing that PAM because the timestamp is old is a fatal error for us. To be fair, this is probably a rare use case, as most companies likely do not write their own PAMs. And there is a reasonable argument that sudo should not skip PAM execution just because it thinks that is equivalent to a password request.

Are you thinking this is a problem with sudo rather than Telegraf?

powersj commented 2 years ago

Does your initial run of sudo require a password due to the way you have PAM setup?

SeanPMiller commented 2 years ago

No, it does not. We are effectively passwordless via this setup, which uses https://www.athenz.io/, but this requires that the PAM shared object actually get called.

I could invite some folks who know more about the details into this discussion if you like.