kamermans / docker-openmanage

Dell OpenManage Server Administrator in a Docker container with SNMP support
57 stars 13 forks source link

Incomplete inventory on some machines #3

Closed maltris closed 7 years ago

maltris commented 7 years ago

Hello,

recently I used this container to upgrade some R610 II, R720xd and R510. On all servers I noticed that the inventory seems incomplete, meaning that on CentOS 7 with dsu, plenty of components are detected and upgraded.

With that container roughly a third of the hardware is even detected by the inventory-run.

The container is running privileged on top of Debian Jessie.

Any suggestions or experiences on this problem?

kamermans commented 7 years ago

Thanks for the bug report. I have not noticed that myself, but it's not totally surprising due to the abstraction between the host OS and the container. I suspect there are missing kernel modules or something of that sort, but I don't personally use it that much. If there is some sort of kernel module that's missing, it would be helpful to know what that module is (lsmod), then we could try figure out what is necessary to load it in Debian.

maltris commented 7 years ago

Glad to help!

So far I figured, that on a native CentOS 6, while running "dsu", the following modules appear:

vfat                   10584  1
fat                    54992  1 vfat
usb_storage            49329  1

mpt3sas               210113  0
mptctl                 31785  0
mptbase                93647  1 mptctl
dell_rbu                9414  0

Since this is CentOS 6, I thought its a bad comparison, so I then ran the Docker-Image on a machine with Debian 8/Jessie as a host:

usb_storage            56215  0

mpt3sas               148132  0
mptctl                 33597  1
mptbase                73042  1 mptctl
dell_rbu               12727  0

The dsu-output inside the container:

[*]1 Firmware for  - Disk 0 in Backplane 0 of PERC 6/i Integrated Controller 0
Current Version : FS64 Upgrade to : FS66

[*]2 BIOS
Current Version : 2.2.10 Upgrade to : 6.4.0

[*]3 PERC 6/i Integrated Controller 0 Firmware
Current Version : 6.3.0-0001 Upgrade to : 6.3.3-0002

Enter your choice : c
Fetching SAS-Drive_Firmware_XJ1HM_LN32_FS66_A08 ...
Installing SAS-Drive_Firmware_XJ1HM_LN32_FS66_A08
Collecting inventory...
..
Inventory collection failed.
SAS-Drive_Firmware_XJ1HM_LN32_FS66_A08 could not be installed
Fetching SAS-RAID_Firmware_F96NR_LN_6.3.3-0002_X00 ...
Installing SAS-RAID_Firmware_F96NR_LN_6.3.3-0002_X00
Collecting inventory...
.....
Inventory collection failed.
SAS-RAID_Firmware_F96NR_LN_6.3.3-0002_X00 could not be installed
Fetching R610_BIOS_C6MRW_LN_6.4.0 ...
Installing R610_BIOS_C6MRW_LN_6.4.0
Collecting inventory...
.
Running validation...

BIOS

The version of this Update Package is newer than the currently installed version.
Software application name: BIOS
Package version: 6.4.0
Installed version: 2.2.10

Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!
.
/tmp/.dellSP-XmlResult5287-29975.DFojZN:1: parser error : Start tag expected, '<' not found
L
^
unable to parse /tmp/.dellSP-XmlResult5287-29975.DFojZN
The system should be restarted for the update to take effect.
Please reboot the system for update(s) to take effect
Done! Please run 'dsu --inventory' to check the inventory
Exiting DSU!

So something is going wrong. The updates fail on the inventory-step and the last one even shows some error. What I notice is the missing fat and vfat module. So now on the same hardware lets boot up CentOS 7 native (a live image from Dell called "DSET"):

vfat                   17411  0
fat                    65913  1 vfat
iqvlinux               28390  0
usb_storage            66305  0 

mpt3sas               195268  0 
mpt2sas               193927  2 
mptctl                 38332  1 
mptbase               105960  1 mptctl
dell_rbu               14315  0 

Now the updates show up fine and run as expected (not shown in the output):

|-----------Dell System Updates-----------|
[ ] represents 'not selected'
[*] represents 'selected'
[-] represents 'Component already at repository version (can be selected only if -e option is used)'
Choose:  q - Quit without update, c to Commit, <number> - To Select/Deselect, a - Select All, n - Select None

[ ]1 BIOS
Current Version : 2.2.10 Upgrade to : 6.4.0

[ ]2 Firmware for  - Disk 0 in Backplane 0 of PERC 6/i Integrated Controller 0
Current Version : FS64 Upgrade to : FS66

[-]3 PowerEdge R610 BCM5709 Gigabit Ethernet rev 20 (em3)
Current Version : 08.07.26 same as : 08.07.26

[-]4 PowerEdge R610 BCM5709 Gigabit Ethernet rev 20 (em2)
Current Version : 08.07.26 same as : 08.07.26

[-]5 PowerEdge R610 BCM5709 Gigabit Ethernet rev 20 (em1)
Current Version : 08.07.26 same as : 08.07.26

[-]6 PowerEdge R610 BCM5709 Gigabit Ethernet rev 20 (em4)
Current Version : 08.07.26 same as : 08.07.26

[ ]7 Dell 32 Bit Diagnostics, v.5148A0, 5148.3
Current Version : 5148A0 Upgrade to : 5162A0

[ ]8 iDRAC6
Current Version : 1.98 Upgrade to : 2.85

[ ]9 PERC 6/i Integrated Controller 0 Firmware
Current Version : 6.3.0-0001 Upgrade to : 6.3.3-0002

[ ]10 Dell Lifecycle Controller, v.1.4.0.586, A03
Current Version : 1.4.0.586 Upgrade to : 1.7.5.4

Enter your choice :

The hardware I tried is a R610. But I had similar problems on R720xds and R510s, too.

srstsavage commented 7 years ago

Focusing on the failed RAID firmware updates, based on some digging/process watching and this info:

http://lists.us.dell.com/pipermail/linux-poweredge/2016-July/050641.html http://sysadm.mielnet.pl/dell-perc-firmware-update-centos-7-fixing-sasdupie-segfault/

it seems that sasdupie in the extracted firmware installer is segfaulting when trying to use an incompatible version (too new) of libstorelibir.so.5. You can use a symlink to replace /opt/dell/srvadmin/lib64/libstorelibir.so.5 with /opt/dell/srvadmin/lib64/libstorelibir.so.3 as the second link suggests.

The following allows me to run modular H700 firmware updates:

docker run --rm -ti --privileged --net=host kamermans/docker-openmanage \
  /bin/sh -c "ln -sf /opt/dell/srvadmin/lib64/libstorelibir-3.so /opt/dell/srvadmin/lib64/libstorelibir.so.5 && dsu"

I'm guessing that the version of libstorelibir.so.5 on your CentOS 6 machine might be old enough to work, while the one installed in this CentOS 7 derived image is too new. (This one is libstorelibir.so.5.07-0).

I haven't tried or investigated the BIOS failure yet. For debugging, it was helpful to run the container with a shell:

docker run --rm -it --privileged --net=host --name dsu kamermans/docker-openmanage bash

exec a process watch in the same container:

docker exec -it dsu watch ps aux

And then run dsu in the container shell and watch the spawned processes. You can ctrl-z in the shell to pause/background the dsu process, which is helpful to examine temp files, run commands, etc right before it breaks.

srstsavage commented 7 years ago

BTW, my results are using Debian Jessie hosts.

To update modular CERC 6/i RAID firmware:

docker run --rm -ti --privileged --net=host kamermans/docker-openmanage \
  /bin/sh -c "yum install -y libxml2.i686 && dsu"
kamermans commented 7 years ago

Thanks for the info - I'm taking a look at it now. I have a R610s, R620s and R720s that I can test it on.

kamermans commented 7 years ago

@maltris: I've made a few changes to the image, and it seems to be working on my R610 with Ubuntu 14.04 as the host OS. Can you give it a docker pull kamermans/openmanage and let me know if it's working for you? It also includes the improvements from @shane-axiom (thanks!).

Also, I'm skimming through the content on the DSET ISO, and I don't really see much of interest there, although they did leave a script in there that shows how they spoof RHEL on CENT in order to trick the Dell software, so I'm doing that now, too. I've imported the DSET ISO filesystem into a docker image, so if you want to poke around in it, it's here: https://hub.docker.com/r/kamermans/dell-dset/

maltris commented 7 years ago

Hello and thanks for the great work,

this already looks much better now. I tested this on a Debian Jessie R620 for now, with the following results:

The firmwares got detected except the ones with the asterisk (*):

|-----------Dell System Updates-----------|
[ ] represents 'not selected'
[*] represents 'selected'
[-] represents 'Component already at repository version (can be selected only if -e option is used)'
Choose:  q - Quit without update, c to Commit, <number> - To Select/Deselect, a - Select All, n - Select None 

[-]1 NetXtreme BCM5720 Gigabit Ethernet PCIe (em1)
Current Version : 20.2.17 same as : 20.2.17

[-]2 NetXtreme BCM5720 Gigabit Ethernet PCIe (em4)
Current Version : 20.2.17 same as : 20.2.17

[-]3 NetXtreme BCM5720 Gigabit Ethernet PCIe (em2)
Current Version : 20.2.17 same as : 20.2.17

[-]4 NetXtreme BCM5720 Gigabit Ethernet PCIe (em3)
Current Version : 20.2.17 same as : 20.2.17

[-]5 BIOS
Current Version : 2.5.4 same as : 2.5.4

[-]6 Firmware for  - Disk 0 in Backplane 1 of PERC H710P Mini Controller 0  
Current Version : YS0C same as : YS0C

[*]7 OS Collector
Current Version : 0 Upgrade to : OSC_1.1

[-]8 12G SEP Firmware 
Current Version : 1.00 same as : 1.00

[*]9 Enterprise UEFI Diagnostics, 4217A4, 4217.7
Current Version : 4217A4 Upgrade to : 4247A1

[-]10  iDRAC
Current Version : 2.41.40.40 same as : 2.41.40.40

[-]11 PERC H710P Mini Controller 0 Firmware
Current Version : 21.3.4-0001 same as : 21.3.4-0001

I confirmed this by running the docker image and then running the "native" dsu on CentOS 7 (DSET). These firmwares do also NOT get detected in the dell dsu.

As I said its already much better, because the higher priority stuff gets updated properly.

kamermans commented 7 years ago

Ok, great! The OS Collector (whatever that is) will probably never be detected properly since the container's OS is almost certainly different than the host's OS, but I admit that I have no idea what it is. Regarding the UEFI Diagnostics component, how do you know it's not being detected properly? In the output all looks well to me for that item.

maltris commented 7 years ago

Hello,

that output I provided is the output from a native CentOS 7. So it marks the upgrades that were not done in the docker-solution. It is just for your information. Since I really only care about the real firmwares (mostly iDRAC, BIOS, Network- and RAID-Controller) for me thats alright.

With some upgrades I experienced failures while upgrading, but I will open a new issue for that since it does not really belong here.

Currently I have plenty of other not-yet-upgraded machines for testing, so on that "Enterprise UEFI DIagnostics"-problem I will keep you up-to-date and try to reproduce that on another machine and if the problem persists, I will now always also try your dset-to-docker-solution, which is pretty cool for testing in my opinion.

So maybe leave that ticket here, since I plan to give you an update tomorrow.

kamermans commented 7 years ago

Ok, sounds good, thanks! I have not tested the dset image at all yet, I just created it directly from the ISO image, so it's entirely possible that there are services that need to be started, etc, for it to work completely, but it may be possible to start them from /etc/init.d or something if need be.

kamermans commented 7 years ago

Using the -v /dev:/dev trick from issue #4 seems to have solved this problem too:

|-----------Dell System Updates-----------|
[ ] represents 'not selected'
[*] represents 'selected'
[-] represents 'Component already at repository version (can be selected only if -e option is used)'
Choose:  q - Quit without update, c to Commit, <number> - To Select/Deselect, a - Select All, n - Select None 

[-]1 NetXtreme BCM5720 Gigabit Ethernet PCIe (em3)
Current Version : 20.2.17 same as : 20.2.17

[-]2 NetXtreme BCM5720 Gigabit Ethernet PCIe (em1)
Current Version : 20.2.17 same as : 20.2.17

[-]3 NetXtreme BCM5720 Gigabit Ethernet PCIe (em4)
Current Version : 20.2.17 same as : 20.2.17

[-]4 NetXtreme BCM5720 Gigabit Ethernet PCIe (em2)
Current Version : 20.2.17 same as : 20.2.17

[ ]5 BIOS
Current Version : 2.4.3 Upgrade to : 2.5.4

[-]6 12G SEP Firmware 
Current Version : 1.00 same as : 1.00

[*]7 OS Collector
Current Version : 0 Upgrade to : OSC_1.1

[*]8 Enterprise UEFI Diagnostics, 4217A4, 4217.7
Current Version : 4217A4 Upgrade to : 4247A1

[-]9  iDRAC
Current Version : 2.41.40.40 same as : 2.41.40.40

[*]10 PERC H710P Mini Controller 0 Firmware
Current Version : 21.3.2-0005 Upgrade to : 21.3.4-0001

Enter your choice : c
Fetching Diagnostics_Application_5W2KP_LN64_OSC_1.1_X10-00 ...
Installing Diagnostics_Application_5W2KP_LN64_OSC_1.1_X10-00
Collecting inventory...
......................................................
Running validation...

OS Collector

No version of this Update Package is currently installed.
Software application name: OS Collector
Package version: OSC_1.1

Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!
..................................................................................................................................................................
The update completed successfully.
Fetching Diagnostics_Application_D5TM2_LN_4247A1_4247.2 ...
Installing Diagnostics_Application_D5TM2_LN_4247A1_4247.2
Collecting inventory...
......................................................
Running validation...

Enterprise UEFI Diagnostics, 4217A4, 4217.7

The version of this Update Package is newer than the currently installed version.
Software application name: Enterprise UEFI Diagnostics, 4217A4, 4217.7
Package version: 4247A1
Installed version: 4217A4

Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!
.........................................................................................................................................................
The update completed successfully.
Fetching SAS-RAID_Firmware_6MFCV_LN_21.3.4-0001_A08 ...
Installing SAS-RAID_Firmware_6MFCV_LN_21.3.4-0001_A08
Collecting inventory...
..
Running validation...

PERC H710P Mini Controller 0

The version of this Update Package is newer than the currently installed version.
Software application name: PERC H710P Mini Controller 0 Firmware
Package version: 21.3.4-0001
Installed version: 21.3.2-0005

Executing update...
WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.
THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!
....................................................................................................
The system should be restarted for the update to take effect.
Please reboot the system for update(s) to take effect
Done! Please run 'dsu --inventory' to check the inventory
Exiting DSU!

After the update, when I run dsu again, I see the versions are updated as expected, except for the System BIOS and RAID BIOS. I think I need to reboot to get the BIOSes to report the new version, but I can reboot my 620s because they are in a Hadoop cluster at the moment (and I'd rather not mess with that system) :)

maltris commented 7 years ago

This looks pretty good. With that change, the problems seems to be fixed, all the inventory is working so far.

mailinglists35 commented 6 years ago

where did you download libstorelibir.so.3 from? I can't find it...

kamermans commented 6 years ago

@mailinglists35 it's in the image under /opt/dell/srvadmin/lib64/:

$ docker run -ti kamermans/docker-openmanage ls -la /opt/dell/srvadmin/lib64/ | grep libstorelibir
lrwxrwxrwx 1 root root      26 Mar 31 02:51 libstorelibir-2.so -> libstorelibir-2.so.20.09-0
lrwxrwxrwx 1 root root      26 Mar 31 02:51 libstorelibir-2.so.20 -> libstorelibir-2.so.20.09-0
-rwxr-xr-x 1 root root  561280 Oct 22  2017 libstorelibir-2.so.20.09-0
lrwxrwxrwx 1 root root      26 Mar 31 02:51 libstorelibir-3.so -> libstorelibir-3.so.14.02-0
lrwxrwxrwx 1 root root      26 Mar 31 02:51 libstorelibir-3.so.14 -> libstorelibir-3.so.14.02-0
-rwxr-xr-x 1 root root  435200 Oct 22  2017 libstorelibir-3.so.14.02-0
lrwxrwxrwx 1 root root      23 Mar 31 02:51 libstorelibir.so -> libstorelibir.so.5.07-0
lrwxrwxrwx 1 root root      43 Mar 31 02:53 libstorelibir.so.5 -> /opt/dell/srvadmin/lib64/libstorelibir-3.so
-rwxr-xr-x 1 root root  307608 Oct 22  2017 libstorelibir.so.5.07-0