Closed jrha closed 7 years ago
to be clear: the sl6 logs shows a different issue as the sl7 logs. i knew about the sl6 one, i didn't about the sl7 (or i always assume the sl7 error was the sl6 one).
the sl7 error is a very strange beast (is it reading a partially updated file? how would that even be possible...), but is probably due to systemd handling the restart via ncm-chkconfig as opposed to the rpm postinstall via chkconfig
Also to be clear, the system eventually gets updated correctly and works, but it's a confusing mess in the middle and it scares me to think what might happen if something got stuck.
@jrha we deployed 17.3.0 release rpms, and are hit by permament broken ncm-cdispd
.
:fearful:
It would be really good to understand what happened, can you tell why ncm-cdispd
is broken?
It was functioning correctly after the upgrade on the test hosts we used.
can you send me a md5sum of the /usr/sbin/ncm-cdispd
? and rpm -qa|grep perl
@jrha on a sl7/centos7 system
@jrha these are actual code bugs, introduced in https://github.com/quattor/ncm-cdispd/pull/48/commits/56471eb366e0a25293157c24f5bba0409e637fdf
a few things went wrong:
i'm really puzzled how this can have worked at RAL.
anyway, PR coming up. either make a 17.3.1 for this or revoke the rpm from the release
The original test VMs have been destroyed, but a recently upgraded VM does indeed have a broken ncm-cdispd. Not sure how I didn't notice. :disappointed: I'm going to delete the RPM from the repo to prevent anyone else breaking their systems and issue a 17.3.1.
Fixes are
;
to close anon-sub (https://github.com/quattor/ncm-cdispd/pull/52/files#diff-76b4fa87b63e1bc5d0bc50ce36f3234cR379)@stdweird was this really fixed by your PR? The postinstall script is still doing confusing things is it not?
@jrha the EL7 things should be gone. what are you still seeing?
The issue describes the restart of cdispd by the postinstall script, if that is not problematic then the issue title and description should be updated.
afaik, the restart is not the issue
ncm-cdispd
is restarted by the post-install script during an upgrade, at the point at which the restart happens the system is in an inconsistent state, where some packages have not yet been upgraded — cue scary "bad Perl code" warnings.Here are sanitised logs for sl6x and sl7x VMs showing the problem.
As mentioned in quattor/release#297.