quattor / ncm-cdispd

Node Configuration Manager Configuration Dispatch Daemon
www.quattor.org
Other
2 stars 5 forks source link

Failed components should be retried at each run #4

Closed jouvin closed 10 years ago

jouvin commented 10 years ago

In recent versions of ncm-ncdispd, if a component B must be run but has a pre-dependency A that has failed, it is not executed and will never be executed again until either there is another requirement to run B (config change or dependency) or until the node is rebooted (or ncm-ncd --configure --all is run). I don't think it was the case in the past and this behaviour is really unexpected IMO. This can leave nodes in an unexpected state.

Looking ncm-cdispd code, this feature is clearly said to be implemented. From the inline pod:

If NCM execution (ncm-ncd) fails, cdispd will not increase the
current profile pivot. This means that subsequent profile changes will
be compared to the profile as it was when NCM failed. In such a way,
failed components are not "forgotten" about (unless they are
deactivated in the new profile).

This issue was initially wrongly reported against ncm-ncd (https://github.com/quattor/ncm-ncd/issues/25).

I think it deserves a fix before in 14.6.0 release (https://github.com/quattor/release/issues/40).

jouvin commented 10 years ago

After further investigation, I confirm that the bug is in ncm-cdispd. This is caused by launch_ncd() not returning the ncm-ncd command status. A fix is coming soon...

jrha commented 10 years ago

If this bug has existed in previous releases it doesn't need fixing for this one, fix it for 14.8.

jrha commented 10 years ago

After discussion with @jouvin 14.6 is now blocked by this bug.