quattor / ncm-cdispd

Node Configuration Manager Configuration Dispatch Daemon
www.quattor.org
Other
2 stars 5 forks source link

ncm-cdispd code cleanup #9

Closed jouvin closed 10 years ago

jouvin commented 10 years ago

Work in progress related to #8...

Discussion on cleanup goals is better placed in #8.

jouvin commented 10 years ago

At this point, the PR is almost complete. Still missing:

jouvin commented 10 years ago

Unit tests are now complete and hopefully provide a total coverage of all the functions used to compare profiles. As said before, several minor flaws of the previous code have been uncovered by these tests.

ncm-cdispd itself has been tested on a real system with real configuration changed, including components in error when ncm-cdispd was starting or after the initial run of components. As it is now, this version of ncm-cdispd can be considered better than the previous one.

I'm going to address all @stdweird's comments (thanks for reviewing the new code with such details!). And implement the new signal handling.

jouvin commented 10 years ago

About the idea to use CAF::FileWriter to do the state file touch, I gave it a try but it doesn't work because we really want to do a touch: create the file if it doesn't exist or update it's time stamp otherwise to reflect the last time the component was scheduled to run. This is not possible with CAF::FileWriter which does nothing if the file is not modified... Thus I keep the current code. As said this is more or less harmless as this portion of the code is skipped during unit tests.

jouvin commented 10 years ago

I now consider this PR complete and ready to be merged if reviewers agree!

The new signal handling seems to work fine. I have not been able to test it in a RPM post scriptlet, this is quite difficult to do until we have a release candidate to deploy. But I tested it from the command line and it seems to work well. As currently configured, HUP and TERM signals are not processed immediately if they occur during ncm-ncd run but they are remembered and the relevant action is done at the end of ncm-ncd run. At any other moment, they are processed immediately.

The delayed processing of TERM may lead to a suprising situation if you do a restart (start + stop) when ncm-ncd is running: the new ncm-cdispd process is started before the current one is really stopped and you end up with 2 (or even more depending on the number of restart you did!) ncm-cdispd process. This is in fact harmless because ncm-ncd creates a lock that forbid new process to actually do anything until the original one has exited, at the end of ncm-ncd. Exaclty the behaviour we want, I think.

I realize that this is worth mentionning in the documentation... I'm doing it!

hpcugentbot commented 10 years ago

Merged build started.

hpcugentbot commented 10 years ago

Merged build triggered.

hpcugentbot commented 10 years ago

Merged build finished.

stdweird commented 10 years ago

retest this please

jrha commented 10 years ago

This looks good to me, I'll leave it @stdweird to merge as he has had the most input!

jouvin commented 10 years ago

I just added the fix discussed in https://github.com/quattor/CAF/issues/32 to solve #2 (improve readability of ncm-cdispd related information in syslog). Note that the ncm-ncd output is no longer logged except in debug mode as it duplicates information already present in syslog and in component log files. This also makes ncm-cdispd.log more readable but is probably worth some notice in the release notes as many users are used to look at ncm-cdispd.log rather than syslog.

jouvin commented 10 years ago

BTW, if this approach is accepted, I suggest to close https://github.com/quattor/CAF/issues/32 and the related PR without merging it.

stdweird commented 10 years ago

i think i found a better way to solve this specific issue, i'll open a PR during the weekend to your branch (or to CAF::Process, as it is a general use case).

jouvin commented 10 years ago

Note that is was part of my initial proposal to fix this at the CAF::Process level! So your proposal is welcome! I'm leaving tomorrow: I'll see if I find the time to break my last commit into 2: one for making the ncm-ncd output logging optional, another one for the rsyslog problem. If I don't find the time, I suggest that you accept this PR and open a new PR that modifies my last mods based on what you did.

stdweird commented 10 years ago

ok, i'll see what will be done. the proposal is very different and has nothing to do with throttling.

if you are going for holidays, enjoy! (i was under the impression you were only leaving the week after. apologies for dragging this further)

jouvin commented 10 years ago

As suggested in a comment from @stdweird, I splitted my last commit implementing both the conditional logging of ncm-ncd output line by line and the rsyslog throttling in 2 separate commits. I'd like to see this PR merged with both commits (as we really need them in the short term for all the reasons explained above) but at a later time if there is something else implemented that makes rsyslog throttling unnecessary, we'll just need to revert this commit. I want to insist that the current syslog throttling has a very limited impact: it is effectively used only when logging ncm-ncd output, it means in debug mode, there is no penalty for normal usage. I hope this approach will be acceptable by everybody. I may be able to look a last time at this discussion this evening...

hpcugentbot commented 10 years ago

Merged build triggered.

hpcugentbot commented 10 years ago

Merged build started.

hpcugentbot commented 10 years ago

Merged build finished.

piojo-zz commented 10 years ago

@stdweird , what's your take on this one? Can it be merged?

stdweird commented 10 years ago

i wanted to try to solve the single log line issue with https://github.com/quattor/CAF/issues/35, and ths avoinding the whole RSYSLOG throttling. i'll try to tackle it before the conf call

jrha commented 10 years ago

As discussed at the standup, this is waiting for @stdweird to do a final pass, he will fix the throttling in a new PR.

jrha commented 10 years ago

Merged as part of #11