Reasoning behind no_log: true

prometheus-community / ansible

Ansible Collection for Prometheus

https://prometheus-community.github.io/ansible/

Apache License 2.0

367 stars 129 forks source link

Reasoning behind no_log: true #375

Closed chriscn closed 1 month ago

chriscn commented 4 months ago

What was / is the reasoning behind this line, setting no_log: true as default. My playbook was failing at this step, and it wasn't clear why this line was not being not logged or by setting CI=true it allowed me to overwrite. For me it was failing because I had mistyped a config value.

https://github.com/prometheus-community/ansible/blob/4af5c9bd3092d64b3e284fd5b9d044893f31a323/roles/alertmanager/tasks/configure.yml#L20

Proposed Solution

I'd like to move the validate step to a different step so it is really clear if the validation is failing, I understand that we'd don't want to log secrets to the log but this seems like a good halfway step.

gardar commented 4 months ago

As you correctly assumed this is to avoid leaking credentials to the logs. Unfortunately it is the only way I can think of to be absolutely sure that Ansible won't ever display sensitive data, unless we'd create a Ansible module for doing the validation.

But I could be wrong, how do you propose we'd do it in a dedicated task?

chriscn commented 4 months ago

I was thinking we could use the shell task to validate and fail if the validate failed.

I'm happy to contribute a PR but wanted to know if that's something would be well received.

gardar commented 4 months ago

Any task that fails may print the variables passed to it, depending on the verbosity and callback plugin settings. Even if we separate the validation into two tasks—one for the actual validation configured with failed_when: false, and another to determine if the validation should be marked as failed—there is still a risk of credentials being leaked if the first task fails due to a connection error.

In my opinion, the best approach would be to create an Ansible module for the validation. This way, we can mark sensitive parameters as secret.

chriscn commented 4 months ago

Sorry if I wasn't clear, I think having a separate task that looks similar to the following:

- name: Validate AlertManager variables
   shell: "{{ _alertmanager_binary_install_dir }}/amtool check-config %s"
   nolog: true

Thinking about it maybe renaming the current task could be useful, currently when it fails it could be for a number of reasons, does the file have space, is it a permission issue. Separating these out makes it clear why the task is failing rather than a generic 'Copy alertmanager config'.

A separate module could be a good idea, although I'm not sure where it would fit.

gardar commented 4 months ago

I don't think that would work because if you are using no_log on that task then you can't see why it's failing and it's essentially the same thing as we are doing now.