grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.39k stars 202 forks source link

Mask secrets in errors #382

Open jkroepke opened 1 year ago

jkroepke commented 1 year ago

What's wrong?

If an http.remote contains credentials, and the connection failed on start. then whole module with all properties are visible in the log which can cause a credentials leak

Steps to reproduce

System information

Windows

Software version

Grafana Agent v0.34.5

Configuration

remote.http "config_all" {
  url            = "https://agent.example.com/grafana-agent/config/default/all.river"
  client {
    basic_auth {
      username = "agent"
      password = "clear-text-password"
    }
  }
}

Logs

Error: C:\Program Files\Grafana Agent Flow\config.river:10:1: Failed to build component: building component: performing request: Get "[https://agent.example.com/grafana-agent/config/default/all.river":](https://agent.example.com/grafana-agent/config/default/all.river%22:) context deadline exceeded

9 |   

10 |   remote.http "config_all" {

   |  _^^^^^^^^^^^^^^^^^^^^^^^^^^

11 | |     url            = "[https://agent.example.com/grafana-agent/config/default/all.river"](https://agent.example.com/grafana-agent/config/default/all.river%22)
12 | |     headers = {
13 | |         "User-Agent" = "..."
14 | |     }
15 | |
16 | |     poll_frequency = "5m"
17 | |     poll_timeout   = "3s"
18 | |
19 | |     client {
20 | |         basic_auth {
21 | |             username = "agent"
22 | |             password = "clear-text-password"
23 | |         }
24 | |     }
25 | | }
   | |_^

26 |   
Error: -: Provided argument "node_exporter_targets" is not defined in the module
Error: -: Provided argument "user_agent" is not defined in the module
Error: -: Provided argument "vm_id" is not defined in the module
Error: -: Provided argument "vm_name" is not defined in the module
interrupt received
ts=2023-08-16T09:11:15.931395Z level=info service=http msg="http server closed" addr=127.0.0.1:12345 err="http: Server closed"
ts=2023-08-16T09:11:15.931395Z level=info service=http msg="http server closed" addr=memory err="http: Server closed"
Error: could not perform the initial load successfully
rfratto commented 1 year ago

A proper fix for this is actually going to be hard to do. However, we could probably add a flag to disable the source pretty-printer until we have the capacity to implement a proper fix.

Alternatively, we could discourage hard-coding secrets into the config, and encourage using env or another component instead to retrieve those secrets.

jkroepke commented 1 year ago

It's really hard to define environment variables on Windows system.

It only possible to define variables on computer context (which would expose the secrets on all users) or to the user context. On this case, I had no idea to define environment variable for the SYSTEM user.

However, if put the password on a file and remove all restriction from the directory that only admins can read it, that feels more safe then using environment variables.

rfratto commented 1 year ago

Regardless I agree it's surprising behavior. I would still be fine with a flag to disable the pretty-printer so the source doesn't get shown in logs.

jkroepke commented 1 year ago

Regardless I agree it's surprising behavior.

Not sure, but also on linux system, the /etc directory contains a lot of files with password. Even the environment files for systems are living in /etc.

I know, on containers env variables are the way to go, but on classical virtual machine deployments, its a bit different.

rfratto commented 1 year ago

I know I mentioned environment variables, but I personally prefer files, either via loading them using local.file or by using the appropriate password_file argument. One reason to do that is to allow the file contents to change at runtime and not need to restart the process (like you would need to when changing environment variables)

jkroepke commented 1 year ago

Are local variables planned for river? In that case, this would also mitigate the issue here.

rfratto commented 1 year ago

@jkroepke Not with that exact implementation, but grafana/alloy#154 would be the equivalent behavior to Terraform's locals block.