quattor / configuration-modules-core

Node Configuration Manager Components for Everyone
www.quattor.org
Other
6 stars 54 forks source link

ncm-metaconfig 23.6.0: all the scheduled actions should be attempted rather than stopping after the first one in error #1681

Open jouvin opened 4 months ago

jouvin commented 4 months ago

When ncm-mdetaconfig has actions associated with file changes (like a daemon restart but probably also true for other actions) and several files have been modified, it runs the actions one by one but stops immediately if the execution of one action returns an error. For exemple, if the file modifications require the restart of several services, and the first one fails to restart successfully, the other services are not restarted. I have not checked if the problem applies to other kind of actions or if it is just for the daemon one but I guess it applies to others...

IMO, this is an important flaw because once you fixed the problem with the service in error and rerun ncm-metaconfig, the fixed service may be restarted if the fix implies a change in the associated file but the others will not because their files were already updated and has not been changed in the last run. Thus the service has a chance not to work properly until a manual restart is done.

I think that the correct behaviour is to attempt to execute all the scheduled actions, no matter whether one is producing an error...

ned21 commented 4 months ago

Hi @jouvin

It's not covered in the component man page but the schema describes how you can prefix the command with a - sign to ignore any failures and keep going with other commands so the mechanism is really flexible.

HTH Nathan

jouvin commented 4 months ago

Hi @ned21 thanks for reminding me this.... It should be at least better documented... Basically the whole template library is wrong! May be the default should have been the opposite but it's probably too late to change...

stdweird commented 4 months ago

this is an issue with more then only this component. same applies to component dependencies. proper state mgmt between runs is not available.

wrt the docs, we used to generate documentation based on the schema. @jrha we still have that around?

ned21 commented 4 months ago

wrt the docs, we used to generate documentation based on the schema. @jrha we still have that around?

My bad, the schema page is generated but metaconfig has so many sub-man pages, it's hard to find it as it appears near the end after ::rsyslog and before ::singularity. It would be nice if we could find a way to place such an important companion doc at the top, directly after the top level manpage.

jouvin commented 4 months ago

@stdweird you are partially right. We know that we have some flaws on the state management between runs but at least you can force a component to run and get the things fixed (at least for the idempotent configuration modules, i.e. most of them).

In this case the problem is more serious because you have no way to fix the problem by rerunning the configuration module. You need to restart the services manually if you know the exact list or easier/better reboot the server which is not optimal... At least if you don't add the magic character...

jouvin commented 3 months ago

@stdweird @ned21 I had a look to the - trick but from what I see in the schema comments and in the code, it is not really doing what I want:

I don't know what can be done without breaking backward compatibility but the current behaviour IMO is wrong as it let a system misconfigured after the configuration issue causing the problem is fixed, without any way to fix it other than manually restarting the impacted service.