Open zacwest opened 1 year ago
Thank you for the research. Retries on timeouts would be useful, also for other temporary network errors. Also logging should help to debug these problems.
I think I should refactor the code first so that there is less duplicate code and the two things can be implemented more easily.
Thanks for this excellent ansible integration. It work as expected and right out-of-the-box.
I am however seeing this issue. When adding ~40 monitors, at least a few of them will fail every time. It doesn't appear to be related to server load, and it shouldn't be a network problem - the sites I am monitoring are all on the same server ~20ms away and the connection is solid.
It doesn't appear to matter whether you use a token or a username/password.
I can't see any errors in docker logs
or nginx's error.log
Using throttle:1
has no effect, nor does forks = 1
.
I suspect #20 may be a symptom of the same issue as I saw this behaviour initially as well.
I can reproduce this every time so can do testing if you can think of anything that will help
Just one more bit of info:
Sometimes the monitor is added even when ansible says its failed, but sometimes it isn't
Did you guys find a work-around for this as I am completely stuck with frozen runs?
Just a quick note to you all that it seems to be an issue with the reverse proxy for me. I am using haproxy. Pointing uptime_url direct to the app worked out perfectly and of course unsecure perfectly.
Just one more bit of info:
Sometimes the monitor is added even when ansible says its failed, but sometimes it isn't
I faced the same issue. It shows an error on the Ansible side: But the monitor has been successfully created on the uptime-kuma side (no errors in docker logs):
Hi!
I'm having the same issue. I'm currently hosting Uptime Kuma on an Azure Web App and the Ansible playbook hangs everytime while executing different tasks.
Any idea on how to overcome this?
i have the same problem
Same issue here too, really annoying and trying to retry when it occurs so far is not working. Makes using the module rather unstable and need to rerun playbooks over and over till everything is created.
Ok one thing to help people is to add retries to uptime kuma tasks
something like
register: task_results
retries: 5
until: task_results.rc | default(0) == 0
ignore_errors: true
This will set the return code to 0 if not defined and retry is its anything other than 0 When it hits those timeouts the return code (rc) is 1 so it will trigger a retry. ignore_errors is set to false so that the exception doesnt stop the playbook in its tracks.
Hope this helps someone hitting the same issue
Thanks for the library, it's made things a lot easier! I'm running into an issue where invocations end up being timed out by Ansible after some kind of internal failure. My setup is somewhat simple: Uptime-Kuma is running in a docker container on fly.io.
For example, running a command like:
I've traced this back to a timeout occurring in socketio (the log output here is my executing the ansible-generated python script manually repeatedly to try and induce the failure) and a raised exception going uncaught:
I added some logging around the call site in
api.py
:https://github.com/lucasheld/uptime-kuma-api/blob/master/uptime_kuma_api/api.py#L478-L484
What's happening appears to be the
loginByToken
call attempts to occur, but times out. Weirdly, I do see this request coming through on the Uptime Kuma side:When this occurs, I see the
_send
call begin, but it never returns until it raises the exception, which doesn't appear to be caught successfully. The end result is the python script hangs indefinitely and ends up being killed by Ansible after the timeout, rather than sending the error up the stack.So perhaps 2 things here: