Closed tingtli closed 8 years ago
the nodules loaded on the booted up node:
c910f03c05k27:~ # lsmod Module Size Used by btrfs 1198649 0 raid6_pq 152816 1 btrfs xor 11459 1 btrfs joydev 14981 0 virtio_balloon 9909 0 rtc_generic 2321 0 sg 43074 0 autofs4 47810 2 af_packet 41146 0 hid_generic 1748 0 usbhid 63317 0 sr_mod 23079 0 sd_mod 57233 0 cdrom 47862 1 sr_mod crc_t10dif 1997 1 sd_mod ibmvscsi 33980 0 scsi_transport_srp 18628 1 ibmvscsi scsi_tgt 16557 1 scsi_transport_srp scsi_mod 287733 6 sg,scsi_transport_srp,ibmvscsi,scsi_tgt,sd_mod,sr_mod virtio_net 31549 0 ohci_pci 6890 0 ohci_hcd 75355 1 ohci_pci usbcore 299154 3 ohci_hcd,ohci_pci,usbhid virtio_pci 11960 0 virtio_ring 15260 3 virtio_net,virtio_pci,virtio_balloon usb_common 4018 1 usbcore virtio 7230 3 virtio_net,virtio_pci,virtio_balloon sunrpc 409352 1 dm_mirror 20709 0 dm_region_hash 14448 1 dm_mirror dm_log 14297 2 dm_region_hash,dm_mirror dm_mod 136239 2 dm_log,dm_mirror
@cxhong was this the error messages that you saw when running genimage?
yeh, we have no problem to boot up nodes with sles12 sp1 diskless image. We may need to clean up the genimage output, add more packages and modules on the packagelist.
Add kernel-firmware to packagelist will get rid of a lot of firmware message. Add adaptec-firmware to packagelist will take care of aic94xx.ko
Did a search, in this forumn, https://forums.opensuse.org/showthread.php/508652-Sudo-zypper-patch-failed-to-install-kernel-modules, the first user who replied claims in his understanding, dracut output is "dracut thinking out loud as it's generating the initrd."
I think the messages are coming out because dracut is trying to do a general detection and certain hardware modules may no longer be needed or is not applicable to the hardware we are running on.
In the output above:
*** Creating image file done ***
Some kernel modules could not be included:
ipv6
xennet
ata_piix
unix
ehci-platform
xhci-pci
yenta_socket
atkbd
i8042
pcmcia
hid-hyperv
hv-vmbus
sdhci_acpi
ipv6
crc32c
I took the list and created a quick script to run lsmod on those kernel modules:
for module in ipv6 xennet ata_piix unix ehci-platform xhci-pci yenta_socket atkbd i8042 pcmcia hid-hyperv hv-vmbus sdhci_acpi ipv6 crc32c ; do
echo "Looking at module: $module"
lsmod | grep $module
done
The output confirms that those kernel modules are not loaded in our running SLES 12 SP1 node
fs2vm1:~ # ./test.sh
Looking at module: ipv6
Looking at module: xennet
Looking at module: ata_piix
Looking at module: unix
Looking at module: ehci-platform
Looking at module: xhci-pci
Looking at module: yenta_socket
Looking at module: atkbd
Looking at module: i8042
Looking at module: pcmcia
Looking at module: hid-hyperv
Looking at module: hv-vmbus
Looking at module: sdhci_acpi
Looking at module: ipv6
Looking at module: crc32c
So I think we are OK.
I ran this on x86_64 and the output has this additional msg:
*** Creating image file done ***
Some kernel modules could not be included
This is not necessarily an error: <======
xennet
pcmcia
sdhci_acpi
ibmvscsi
the initial ramdisk for statelite is generated successfully.
hi @whowutwut @cxhong ,
thanks for your comments and fix.
The fix can resolve the failures when dracut install modules. However, the size of rootimg.gz generated by packimage
increased from 190M to 250M. The increased size is caused by the firmware data under /lib/firmwares/*
, since all the necessary firmware data and kernel modules have been installed in the initrd, is it necessary to keep all these firmware data in the rootimg? it not, we'd better add "/lib/firmware" into the exlist.
reopen this to carry on the discussion
@immarvin
Should we be removing the firmware files? If we remove too much we might have problems when provisioning the diskless compute nodes. Also if the customer is running different hardware, it might not be able to load the firmware required for everything to work.. What is the downside of having the rootimg file grow slightly.
Generally speaking, i think the O/S will grow larger and larger in each release, I don't think it's unreasonable to have the diskless rootimg also grow because there are new devices that may need to be auto-detected by the new versions of the operating systems..
hi @whowutwut ,
the growth of diskless initrd and rootimg.gz will increase the time spent on diskless boot up and memory occupation.
The exclude list attribute linuximage.exlist
enables user to trim the image after the rpms are installed into the rootimage, so that the rootimg.gz will be as small as possible. xCAT ships a default exlist, the users can add or remove entries in the exlist file according to there need.
# tabdump -d linuximage|grep exlist
exlist: The fully qualified name of the file that stores the file names and directory names that will be excluded from the image during packimage command. It is used for diskless image only.
on the "/lib/firmware" directory, I agree with you that it should not be excluded from rootimg.gz.
I think there are 2 things we might need to do in sles12.1 support:
1.the default exlist for sles12.1 ppc64le diskless osimage,/opt/xcat/share/xcat/netboot/sles/compute.sles12.ppc64le.exlist
, is out-of-date since sles12, it should be updated
Docs » Admin Guide » Manage Clusters » IBM Power LE / OpenPOWER » Diskless Installation » Customize osimage (Optional)
for trimming the rootimg.gz during packimage. opened a new ticket to trace this https://github.com/xcat2/xcat-core/issues/472
I guess the error of missing kernel module was caused by the module 'kernel-modules' we added for dracut command to create initrd. And yes, the root cause should be we did not add necessary rpms in the root image.
I agree we can fix it by adding more rpms in the pkglist or remove the exclusive pkgs form exlist. But I recommend to fix this in next release. Customer can work around this issue easily and mostly I think the real customer always needs to customize the pkglist and exlist.
dracut command
@immarvin @penguhyang I think we need to reopen it to trace it. I see some error or warning message when i run regression on 2.11.1
Please cherry-pick the pull request #480 from master branch to 2.11 branch.
@daniceexi is that the right pull request number?
@whowutwut Yes, it is. We thought the changes in the #480 will help to remove some of the error messages in the sles genimage.
@daniceexi thanks, I see now, I looked too quickly and thought only documentation changes were made in that pull request.
@tingtli As we discussed the message were generated by the postscript of rpms, xcat cannot do anything for that. So we have to keep it as is.
I set this issue as won't fix
and close it, if you have any concern, please reopen it.
xCAT 2.11 11/24 build. genimage returns a lot of errors.