quattor / ncm-cdispd

Node Configuration Manager Configuration Dispatch Daemon
www.quattor.org
Other
2 stars 5 forks source link

State files for non-dispatched components are created during daemon start #35

Closed ned21 closed 7 years ago

ned21 commented 7 years ago

After upgrading from 1.2.1 to 16.6 we noticed that we had stale files left in /var/run/quattor-components. We noted that these were for non-dispatched components and had a timestamp that matched the last restart of ncm-cdispd (which we do nightly). This is a bug because we have processes that check for the existence of these files to indicate that a components that did not successfully complete or finish.

Steps to reproduce:

  1. Make sure you have at least one component in your profile with dispatch=false
  2. sudo rm /var/run/quattor-components/*
  3. sudo service ncm-cdispd restart
  4. Wait for ncm-ncd to finish successfully with no errors.
  5. ls /var/run/quattor-components

Expected result: empty directory Actual result: state files remain for all non-dispatched components.

stdweird commented 7 years ago

The current code has the following usage of state files:

as a result:

Also good to know: ncm-cdispd runs ncm-ncd with its own statedir configuration; i.e. if the statedir of ncm-ncd.conf and ncm-cdispd.conf does not match, you can get some surprises (manual runs will not afftect the state the ncm-cdispd runs, which ofcourse could also be intentional). by default, they are the same (altough not unittested afaik)

@ned21 wrt your issue: the current code to set the state should be called in the else block at https://github.com/quattor/ncm-cdispd/blob/master/src/main/perl/Utils.pm#L183 rather than at the beginning, so only active and to be dispatched components have the state file touched by ncm-cdispd. (or the remove method in ncm-cdispd should also remove statefiles?) i'm a bit uncertain why ncm-cdispd touches the state files at all, probably to handle cases where ncm-ncd crahses? ncm-cdispd should also use (or try to) the NCD::ComponentProxy->set_state method, instead of its own non-logged implementation. (and the ncm-cdispd script itself should use a FileReader to read the state files, or at least cancel the current Editor before close)

stdweird commented 7 years ago

@ned21 i think all issues are covered and referenced here. i'll leave it to you for further testing and/or closing