xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
359 stars 171 forks source link

[Not sure] Ubuntu 16 & 18 cannot be installed on non-clean disks #6508

Open MasterGroosha opened 4 years ago

MasterGroosha commented 4 years ago

I've been playing with xCAT recently on 3 different Supermicro servers.

In 1/3 cases installation of both 16.04.6 and 18.04.3 finished correctly. However in 2/3 cases both 16.04.6 and 18.04.3 installation hung completely.

I was monitoring all installations with xcatprobe osdeploy -n {nodename} command, the output was trimmed a bit (osdeploy doesn't support -w argument), however in all "failure" cases installation hung after these lines:

Via HTTP get /install/ubuntu18.04.3/x86_64/pool/main/g/grub-gfxpayload-lists/grub-gfxpayloa...
Via HTTP get /install/ubuntu16.04.6/x86_64/pool/main/g/grub2/grub-pc_2.02%7ebeta2-36ubuntu3...

I copied these lines from different terminals and from different OS installations, as you have noticed. As far as I remember next step should be post-install script and first reboot. And since I see lines about grub but don't see the next line, maybe post-install script is the core of the problem.

Now where the most interesting part happens. From what I've experienced, if I boot with GParted Live USB and completely clean all data disks by creating a new partition table and then running dd if=/dev/zero of=/dev/sdX count=1048576 bs=1024, then OS should install without any issues. If I don't do this and leave disks "as is", installation hangs after lines mentioned above.

Both "failure" servers have 2 disks and "good" server has 6 disks if it matters.

Originally posted by @MasterGroosha in https://github.com/xcat2/xcat-core/issues/6431#issuecomment-563110654

MasterGroosha commented 4 years ago

A small update: Unfortunately, even erasing disks does not guarantee successful installation.

MasterGroosha commented 4 years ago

Another update: I really have no idea what's wrong with me or xCAT or our servers. After lots of other tests now I see that if I remove one drive completely (The server itself is a twin with each part having 2 HDD slots max), Ubuntu 18 successfully installs. However with both drives plugged it, it fails before running postinstall scripts (or in the middle of them, though there's no info in log)

Edit: I think there might be a problem with LVM (when there are 2 disks, xCAT tries to setup an LVM which fails somehow). How can I disable it? In ubuntu-server.seed file I see these lines:

# Suggest LVM by default.
d-i     partman-auto/init_automatically_partition       string some_device_lvm
d-i     partman-auto/init_automatically_partition       seen false

However changing some_device_lvm to some_device (as suggested elsewhere) didn't help. Maybe xCAT overrides these values? If yes, then where can I setup it? Unfortunately xCAT documentation is not very clear about it.

It looks like xCAT uses "regular" deploy scheme without LVM, so unfortunately I'm stuck again :(

### Partitioning
# This creates a small /boot partition, suitable
# swap, and uses the rest of the space for the root partition:

d-i partman-auto/method string regular
d-i partman-lvm/device_remove_lvm boolean true
d-i partman-md/device_remove_md boolean true

But I still think the problem is somehow connected to partitioning scheme, since manual install goes fine and automatic install with only one disk inserted goes fine too.

MasterGroosha commented 4 years ago

Update: I've been monitoring logs via Wireshark (since xCAT writes much more info via Syslog and not its own log system)

When using 2 disks (installation hangs) "/dev/sda6 is active swap" is the last line before it "hangs". But when I leave only one disk and installation succeeds, the next lines are: изображение

What's wrong?