As discussed on the mailing list, ncm-ncd should have a configurable timeout setting that will cause NCM components that have stuck indefinitely for whatever reason not to prevent ncm-ncd from continuing with its job and completing with an exit status. ncm-ncd should behave in one of three ways:
1) Current behaviour, i.e. timeout set to zero means never timeout.
2) Alert if a component times out, but continue to wait.
3) Alert if a component times out, kill/clean-up that component and continue with the next components. Any components that depend on the killed one cannot be run, of course and should also be reported as errors with a note that the parent component failed.
Without a timeout, if any component hangs indefinitely, so does ncm-ncd and subsequent runs of ncm-ncd cannot take place due to the lock file. This has left one affected system in a state where nothing was updated for a month and nobody noticed.
As discussed on the mailing list,
ncm-ncd
should have a configurable timeout setting that will cause NCM components that have stuck indefinitely for whatever reason not to preventncm-ncd
from continuing with its job and completing with an exit status.ncm-ncd
should behave in one of three ways:1) Current behaviour, i.e. timeout set to zero means never timeout. 2) Alert if a component times out, but continue to wait. 3) Alert if a component times out, kill/clean-up that component and continue with the next components. Any components that depend on the killed one cannot be run, of course and should also be reported as errors with a note that the parent component failed.
Without a timeout, if any component hangs indefinitely, so does
ncm-ncd
and subsequent runs ofncm-ncd
cannot take place due to the lock file. This has left one affected system in a state where nothing was updated for a month and nobody noticed.