semaphoreui / semaphore

Modern UI and powerful API for Ansible, Terraform, OpenTofu, PowerShell and other DevOps tools.
https://semaphoreui.com
MIT License
10.69k stars 1.07k forks source link

Template which should run all 2 minutes is stuck in waiting since updating to 2.9.75 #1999

Open livdebus opened 6 months ago

livdebus commented 6 months ago

Since updating from 2.9.37 to 2.9.75 we have a template which is set to run all two minutes (via template cron settings /2 *) getting stuck in status waiting. Only fix is to stop last waiting job and reboot. Happens every day and does not recover itself.

Other templates do run fine, even when the affected template is stuck. So it does only affect this template.

Anyone able to help?

image image

ivibross commented 6 months ago

Same here. This does happen in a synchronize pull task. @livdebus is this also the case for you?

ivibross commented 6 months ago

This is the debud log output of the stucking task: TASK [Pull web files to semaphore] ********************************************* task path: /home/semaphore/repository_6_36/FileTransmit.yml:35 [DEPRECATION WARNING]: The connection's stdin object is deprecated. Call display.prompt_until(msg) instead. This feature will be removed in version 2.19. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. redirecting (type: modules) ansible.builtin.synchronize to ansible.posix.synchronize redirecting (type: action) ansible.builtin.synchronize to ansible.posix.synchronize redirecting (type: action) ansible.builtin.synchronize to ansible.posix.synchronize ESTABLISH LOCAL CONNECTION FOR USER: semaphore EXEC /bin/sh -c '( umask 77 && mkdir -p " echo /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm "&& mkdir " echo /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235 " && echo ansible-tmp-1715689337.1515107-8357-129744323250235=" echo /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235 " ) && sleep 0' Using module file /home/semaphore/.ansible/collections/ansible_collections/ansible/posix/plugins/modules/synchronize.py PUT /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/tmpwx00atba TO /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/AnsiballZ_synchronize.py EXEC /bin/sh -c 'chmod u+x /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/ /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/AnsiballZ_synchronize.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python3.11 /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/AnsiballZ_synchronize.py && sleep 0' Running playbook failed: signal: killed

The definition of the task is:

livdebus commented 6 months ago

Same here. This does happen in a synchronize pull task. @livdebus is this also the case for you?

nope, it is a task which runs a powershell script on a windows host

livdebus commented 6 months ago

but seems that duplicating the affected task did solve the issue for me, stable now since over 4 days

fiftin commented 6 months ago

Hi @tboerger can you reproduce this issue? I can't.

fiftin commented 6 months ago

@livdebus is the task works fine?

tboerger commented 6 months ago

Me neither

livdebus commented 6 months ago

@livdebus is the task works fine?

yes still works fine since duplicating the affected task

livdebus commented 5 months ago

Happens again: image

image

This time duplicating the template did not help. But I have some more details. Seems that a previous task is hanging and therefore all further task of the same template are stuck in queued status.

But task is hanging in a state where actual playbook has not started (confirmed since playbook would write to a custom logfile which did not happen), so it is still in preparation stage (github updating): image

Template details: image

Is there anywhere a logfile of semaphore which would show more details?

Only workaround so far is to reboot the semaphore host then the runbooks works again for some hours....

livdebus commented 5 months ago

Or is there any way to define a timeout for a template? So it is forced to stop a single run after some time.

cm-schl commented 3 weeks ago

+1 for having a possibility to define a timeout for a template. I've found myself in a similar situation having multiple tasks waiting for an other task to complete. The problem was a simple use of the apt module that was somehow blocked. Because of the fact that you can't define a global timeout for a playbook in Ansible having Semaphore to stop or at least warn about hanging templates would be great.