quattor / ncm-cdispd

Node Configuration Manager Configuration Dispatch Daemon
www.quattor.org
Other
2 stars 5 forks source link

14.8.0-rc5: Quattor upgrade leaves 2 uncomplete YUM transactions #15

Closed jouvin closed 9 years ago

jouvin commented 9 years ago

After deployment of a new Quattor version, there are 2 uncomplete YUM transaction left in 14.8.0-rc5 (https://github.com/quattor/release/issues/54). This means that delayed signal processing does not work as expected, at least during the upgrade process. It was well tested when added in July but a race condition may be present...

This is harmless as we run yum-complete-transaction as part of SPMA, as long as https://github.com/quattor/configuration-modules-core/issues/294 is merged.

This issue probably explains #14.

jouvin commented 9 years ago

The delayed processing has been broken by the fact that is_executable method in CAF::Process runs the command... Thus the delayed processing is configured after the end of the command...

        if ( $p->is_executable() ) {
            # Delay processing of some signals
            delay_signals();

Clearly need at least a workaround in 14.8.0 final release... If the problem is difficult to solve in CAF whe can make the change in signal handling independent of the fact that we can execute the command...

jouvin commented 9 years ago

Forgot my previous comment... is_executable() is not the problem and doing just what is expected!

jouvin commented 9 years ago

I reviewed the code doing the signal handling and could not found anything wrong. The previous commit logs actions related to delayed signal processing at info level. This should help troubleshoot this problem and is a better information for site admins anyway. I suggest merging this in 14.8.0 and going on with the release. The problem itself is harmless with the ncm-spma fix and the improved logging will help to troubleshoot the problem... at next Quattor update after the release (next release RC).

jrha commented 9 years ago

More debugging will be analysed during RC cycle.

jouvin commented 9 years ago

After detailed analysis of logs during 14.10.0-rc2, I confirm that everything works as expected. The uncompleted YUM transactions cannot be seen anymore but this is probably due to the fact that ncm-spma runs yum-complete-transaction. Anyway, with delayed signal processing, there should be no uncompleted transaction left... There are as many restarts of ncm-ncd as there are RPM scripts doing a ncm-cdispd restart... Currently we have two: ncm-cdispd and ncm-cdp-listend. There is not much that can be done as the new ncm-cdispd process is started immediately, before completion of the existing one, and waits for the first one to complete before really running the components (ncm-ncd lock). Apart the fact that this is a bit surprising when you look at the logs, this is harmless (as long as configuration module are idempotent... but they need to be by design!) and avoid a more complex strategy to handle properly TERM signal. I'm in favor of closing this issue, if nobody objects it.

jouvin commented 9 years ago

Closing after 14.10.0-RC3... Reopen if seen again or if any sign of a hidden problem...