Closed chriscn closed 3 months ago
As you correctly assumed this is to avoid leaking credentials to the logs. Unfortunately it is the only way I can think of to be absolutely sure that Ansible won't ever display sensitive data, unless we'd create a Ansible module for doing the validation.
But I could be wrong, how do you propose we'd do it in a dedicated task?
I was thinking we could use the shell task to validate and fail if the validate failed.
I'm happy to contribute a PR but wanted to know if that's something would be well received.
Any task that fails may print the variables passed to it, depending on the verbosity and callback plugin settings. Even if we separate the validation into two tasks—one for the actual validation configured with failed_when: false, and another to determine if the validation should be marked as failed—there is still a risk of credentials being leaked if the first task fails due to a connection error.
In my opinion, the best approach would be to create an Ansible module for the validation. This way, we can mark sensitive parameters as secret.
Sorry if I wasn't clear, I think having a separate task that looks similar to the following:
- name: Validate AlertManager variables
shell: "{{ _alertmanager_binary_install_dir }}/amtool check-config %s"
nolog: true
Thinking about it maybe renaming the current task could be useful, currently when it fails it could be for a number of reasons, does the file have space, is it a permission issue. Separating these out makes it clear why the task is failing rather than a generic 'Copy alertmanager config'.
A separate module could be a good idea, although I'm not sure where it would fit.
I don't think that would work because if you are using no_log
on that task then you can't see why it's failing and it's essentially the same thing as we are doing now.
What was / is the reasoning behind this line, setting
no_log: true
as default. My playbook was failing at this step, and it wasn't clear why this line was not being not logged or by settingCI=true
it allowed me to overwrite. For me it was failing because I had mistyped a config value.https://github.com/prometheus-community/ansible/blob/4af5c9bd3092d64b3e284fd5b9d044893f31a323/roles/alertmanager/tasks/configure.yml#L20
Proposed Solution
I'd like to move the validate step to a different step so it is really clear if the validation is failing, I understand that we'd don't want to log secrets to the log but this seems like a good halfway step.