prometheus-community / ansible

Ansible Collection for Prometheus
https://prometheus-community.github.io/ansible/
Apache License 2.0
396 stars 133 forks source link

fix: add check_mode in localhost binary cache task #431

Closed santilococo closed 1 month ago

santilococo commented 1 month ago

This PR adds the check_mode: false in the localhost binary cache task.

This is needed because the next task, which downloads the prometheus binary, expects the folder to already be created. Without check_mode: false, the task only runs in "simulation" mode and doesn't create the folder.

This code was introduced in the most recent PR #425: https://github.com/prometheus-community/ansible/pull/425/files#diff-c9a8a7fbd7ee9c40d13e4be2e88801e621b943a2246b78ad46b93dc0f7c33971R38

You can see that this task did not exist before: https://github.com/prometheus-community/ansible/pull/425/files#diff-6f9278158452a81913bff59d794daa00e142796b04e75efd873c4a5dbb7f2899L38

Fixes #430

erikgb commented 1 month ago

LGTM! Thanks @santilococo!

erikgb commented 1 month ago

Can we get this important bugfix reviewed, merged, and released ASAP?

voidquark commented 1 month ago

Same problem on our side. LGTM :rocket:

SuperQ commented 1 month ago

Test errors are due to github rate limits. Will retry them.

SuperQ commented 1 month ago

@gardar do we really need to run all the role tests when updating _common?

gardar commented 1 month ago

@gardar do we really need to run all the role tests when updating _common?

Yeah since it's affecting all the roles I'm afraid it's the only way to catch potential errors. But I agree, the number of tests is getting pretty ridiculous 😅, I think we need to tone it down a bit.

SuperQ commented 1 month ago

I think for _common we can get enough coverage by just testing the prometheus role.

oc commented 1 month ago

Great, thanks! LGTM!

erikgb commented 1 month ago

Maybe in a follow-up PR, but could it make sense to mark all these download-related tasks as changed: false. In our container-based Ansible controller setup, these tasks are always marked as changed - which is correct. But pretty annoying when looking at the summary of the play.

It also appears that Prometheus is always restarted (by running the restart handler). I think this is caused by this unnecessary "changed" tasks, but not 100% sure. I will monitor this bad behavior when a new release is out.

gardar commented 1 month ago

Maybe in a follow-up PR, but could it make sense to mark all these download-related tasks as changed: false. In our container-based Ansible controller setup, these tasks are always marked as changed - which is correct. But pretty annoying when looking at the summary of the play.

No I don't think that's a good idea to fake the "changed" status like that, you should be able to see when the download is actually being executed. I'd suggest you use a persistent volume for the local cache path instead, it should save you some download time and you would avoid being rate limited by github.

It also appears that Prometheus is always restarted (by running the restart handler). I think this is caused by this unnecessary "changed" tasks, but not 100% sure. I will monitor this bad behavior when a new release is out.

No the download tasks don't notify the handlers, so there must be some other task that's returning the changed status.

erikgb commented 1 month ago

Maybe in a follow-up PR, but could it make sense to mark all these download-related tasks as changed: false. In our container-based Ansible controller setup, these tasks are always marked as changed - which is correct. But pretty annoying when looking at the summary of the play.

No I don't think that's a good idea to fake the "changed" status like that, you should be able to see when the download is actually being executed. I'd suggest you use a persistent volume for the local cache path instead, it should save you some download time and you would avoid being rate limited by github.

We have a proxy in-between, since GitHub is not directly available. Semi air-gapped environment. I think changes to the Ansible controller should be safe to ignore. But point taken, we can agree to disagree. 😉

A persistent disk is not a feasible alternative for us. Try getting that on GitLab Docker CI/CD runners!

It also appears that Prometheus is always restarted (by running the restart handler). I think this is caused by this unnecessary "changed" tasks, but not 100% sure. I will monitor this bad behavior when a new release is out.

No the download tasks don't notify the handlers, so there must be some other task that's returning the changed status.

Thanks! I will debug this when a new release is available.

gardar commented 1 month ago

A persistent disk is not a feasible alternative for us. Try getting that on GitLab Docker CI/CD runners!

We are getting way off topic here, but both github/gitlab offer cache features which you could probably use for that. If you figure out how to do it then it might be useful to add it do the docs 😄 https://docs.gitlab.com/ee/ci/caching/ https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#using-the-cache-action

Frazew commented 1 month ago

No I don't think that's a good idea to fake the "changed" status like that, you should be able to see when the download is actually being executed.

I'm jumping in because I wanted to suggest the exact same check_mode: false fix and found this PR. However I would also argue that changed: false should be put on this folder creation and download tasks. Otherwise the role is not very clearly idempotent: running it twice still shows tasks as changed even though nothing changed on the remote host.

imo these downloads are anyway local prerequisites for the role to run correctly, they're not indicating any changed action on the remote host, that's the job of the Propagate binaries task

github-actions[bot] commented 1 month ago

Docs Build 📝

Thank you for contribution!✨

This PR has been merged and the docs are now incorporated into main: https://prometheus-community.github.io/ansible/branch/main

Frazew commented 1 month ago

Just a quick comment here to say that I opened #433 to move the changed_when discussion there, I've tried to summarize the various arguments for/against, thanks!