kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
29.14k stars 4.86k forks source link

vmware: minikube iso fails to boot on arm64 #14661

Open gilbahat opened 2 years ago

gilbahat commented 2 years ago

What Happened?

In the interest of re-enabling the vmware driver for the mac m1 platform, I have tried (manually) booting the minikube arm iso on vmware fusion. it will not boot.

dmesg screenshot: https://imgur.com/a/SWHiRHX dmesg output is only available after removing the console=ttyAMA0 from the boot command line

note that this may also be the reason parallels fails to run, can't say as my parallels trial expired

Attach the log file

n/a

Operating System

macOS (Default)

Driver

VMware

afbjorklund commented 2 years ago

Seems like it can't find the ISO filesystem, but at least it found the EFI and the kernel

search --no-floppy --label EFIBOOTISO --set root

The init (pid 1) is located on the initrd, which is supposed to be loaded by GRUB:

menuentry "Buildroot" {
  linux /boot/bzimage console=ttyAMA0 # kernel
  initrd /boot/initrd # rootfs
}
afbjorklund commented 2 years ago

For the other hypervisors, there were some arm64 problems with SATA vs SCSI

Maybe it is something similar here, vmware using an unexpected CD device ?

Either way, probably needs to be looked into upstream (in driver):

https://github.com/machine-drivers/docker-machine-driver-vmware

afbjorklund commented 2 years ago

You should be able to get commercial support for both vmware and parallels...

But the only open source driver and hypervisor for is currently qemu (with hvf).

gilbahat commented 2 years ago

I doubt docker-machine would be relevant if it fails even without its intervention? it should boot manually before docker-machine has any chance of setting things up correctly

either way:

VMWare fusion offers 2 cdrom device options: scsi and sata scsi doesn't seem to work at all with sata I get this in dmesg: scsi 0:0:0:0 CD-ROM NECVMWar SATA CD00 1.00 PQ:0 ANSI: 5 which appears correct.

testing the device configuration with ubuntu: sata device is identified as 'VMware SATA AHCI Device' with scsi cdrom, ubuntu too refuses to boot.

perhaps the solution is to build the appropriate modules in buildroot (also for parallels)

afbjorklund commented 2 years ago

perhaps the solution is to build the appropriate modules in buildroot (also for parallels)

Perhaps, it needs to be cleaned up during the refactoring anyway. Could need contributions

gilbahat commented 2 years ago

I think I know what the problem is. looking at the ubuntu boot sequence, immediately following the scsi identification, it registers the cdrom as sr0.

looking at https://github.com/kubernetes/minikube/blob/master/deploy/iso/minikube-iso/board/minikube/aarch64/linux_aarch64_defconfig , CONFIG_BLK_DEV_SR isn't set while it is set for x86_64.

there might be some more scsi setup needed but it is possible that re-enabling this could fix support.

while we're at it, CONFIG_VMWARE_PVSCSI=y might help too - if available for aarch64

gilbahat commented 2 years ago

I found a dmesg dump from parallels and can confirm it's also likely the culprit: [ 1.189890] scsi 1:0:0:0: CD-ROM Virtual DVD-ROM R103 PQ: 0 ANSI: 5 (...) [ 1.240916] sr 1:0:0:0: [sr0] scsi3-mmc drive: 44x/44x cd/rw xa/form2 cdda tray [ 1.241464] cdrom: Uniform CD-ROM driver Revision: 3.20

I am compiling a new iso locally as I write this and will report if it boots on vmware fusion.

afbjorklund commented 2 years ago

For the libvirt driver (kvm), it was changed here:

pkg/drivers/kvm/domain_definition_arm64.go

@@ -47,7 +49,7 @@
   <devices>
     <disk type='file' device='cdrom'>
       <source file='{{.ISO}}'/>
-      <target dev='hdc' bus='scsi'/>
+      <target dev='sdc' bus='sata'/>
       <readonly/>
     </disk>
     <disk type='file' device='disk'>
gilbahat commented 2 years ago

okay, so I could finally test this. adding the module did in fact cause the machine to detect the cdrom properly and offer it as a root mount (for some reason, boot params needed to be modified for this to work. unsure why it works on other platforms/machines). it now fails on missing filesystem: https://imgur.com/a/zqFQEe6

indeed checking the conf, CONFIG_ISO9660_FS=y is also missing. That's... weird.

afbjorklund commented 2 years ago

Adding CONFIG_BLK_DEV_SR and CONFIG_ISO9660_FS seems easy enough...

Not sure why they were dropped, maybe @klaases or @sharifelgamal can recall

Could be as simple as different defconfig, between the kernel architectures ?

i.e. that they were never explicily added, just happened to be there by default.

kernel/x86_64_defconfig:CONFIG_BLK_DEV_SD=y kernel/x86_64_defconfig:CONFIG_BLK_DEV_SR=y

kernel/x86_64_defconfig:CONFIG_EXT4_FS=y kernel/x86_64_defconfig:CONFIG_ISO9660_FS=y

kernel/aarch64_defconfig:CONFIG_BLK_DEV_SD=y

kernel/aarch64_defconfig:CONFIG_EXT2_FS=y kernel/aarch64_defconfig:CONFIG_EXT3_FS=y

gilbahat commented 2 years ago

I will try and retest again tomorrow unless you feel it's right to proceed forth with the requested flag changes for aarch64. summing them up:

important for boot

CONFIG_BLK_DEV_SR=y CONFIG_ISO9660_FS=y CONFIG_VMWARE_PVSCSI=y

should be otherwise prudent

CONFIG_VMWARE_BALLOON=m CONFIG_VMWARE_VMCI=m CONFIG_XFS_FS=y CONFIG_XFS_QUOTA=y CONFIG_XFS_POSIX_ACL=y

this would not explain why the grub command line had to be manually modified though (the 2nd line removed, added initrd=/boot/initrd and root=/dev/sr0 to the 1st one)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale