quattor / ncm-cdispd

Node Configuration Manager Configuration Dispatch Daemon
www.quattor.org
Other
2 stars 5 forks source link

restore old ICLIST on failure #22

Closed stdweird closed 8 years ago

stdweird commented 8 years ago

step 0: initialisation

step 1: new profile

step 2. newprofile (or option 2b old profile, ie with same checksum as CID 1)

there is no recovery possible (except restarting ncm-cdispd) solution: restore ICLIST from step 0 if there's a failure in step 1.

problem: the launch_ncd sub has the following

# At this point, ICLIST should contain only components present in the last profile received.
# The only case where a component may be in the list without being part of the configuration
# is the following:          
#   - Profile n is deployed succesfully (ncm-ncd returns a success)
#   - Profile n+1 add a new component X that fails (reference config to compare next profile with remains n)
#   - Profile n+2 remove component X but the profile comparison occurs between n and n+2 (because 
#     X failed with profile N+1) and thus X removal is not detected.
# As a result, X remains on the list of component to run. This should be harmless as ncm-ncd will ignore it.
# This is probably rare enough to avoid complex processing to handle this in ncm-dispd.

for some reason, it assumes that ncm-ncd simply ignores undefined components. i also do not understand why this whole ICLIST thing is needed, and we don't simply run ncm-ncd --all

jouvin commented 8 years ago

Clearly, running ncm-ncd --all at each run would be sub-optimal... There is no reason for this. I tend to agree that ICLIST could be safely reset to the one coming from OLDCFG. I remember running into the problem you mentioned and restarting ncm-cdispd (I may have added the comment you mentioned!).