marlof / ScORCH

DevOps Orchestration for Obrar deploy and Ansible playbooks
http://www.autoscorch.com
Apache License 2.0
5 stars 1 forks source link

Jobs become Orphaned in some OSs #196

Closed marlof closed 3 months ago

marlof commented 3 months ago

A recent test Ubuntu 24 deployment of scorch 3.1.12 is orphaning jobs that are still running. This is incorrect and needs addressing.

This has also been seen on a debian based OS and the impact was limited by editing the fn_CheckRunning() function and disabling the move

marlof commented 3 months ago

The code with the issue is Scorch_Dispatcher fn_CheckRunning:

 local arr_RunningJobs
  arr_RunningJobs=$(ls -t1 "${dir_Running}" | grep -v "pause$" 2>/dev/null) || :
  if [[ "${arr_RunningJobs}" ]] ; then
    for file_EachJob in ${arr_RunningJobs} ; do
      if [ ! "$(ps -ef | grep ${file_EachJob} | grep -v grep)" ] ; then
        mv "${dir_Running}/${file_EachJob}" "${dir_Failed}/."
        echo "$(${fn_LogDate}) Orphaned. Resume point:unknown" | tee -a "${dir_Active}/${file_EachJob}" "${dir_Log}/${file_EachJob}.log"
      fi
    done
  fi