Open JohnConnett opened 1 year ago
The product page says, "2x 2.5Gb Ethernet port". This means it likely has the Intel I225 or I226 Ethernet chipset, for which illumos does not yet have support. I cannot explain necessarily the usb-boot installer failure, but while illumos may boot on this, it cannot use the built-in I225/I226 Ethernet just yet.
Good point. It has 2 x Intel I225-LM. I have also observed some MAC address strangeness with these two devices.
I'm not familiar with how pxeboot
works. If the switch from an UEFI network driver to another driver happens within pxeboot
that would explain why no further files were requested via tftp.
I should have mentioned that I have disabled secure boot (another potential source of problems).
Tried another approach. I have a Plugable USBC-E2500 so I tried using that as suggested on the iPXE Forum. This also fails with both pxeboot
and pxegrub
. Here is the output for pxegrub
:
Shell> FS0:
FS0:\> ncm.efi
iPXE initianising devices...ok
iPXE 1.21.1+ (gbd136) -- Open Source Network Boot Firmware -- https://ipxe.org
Features: DNS HTTP iSCSI TFTP VLAN AoE EFI Menu
net0: 8c:ae:4c:dd:3e:31 using cdc-ncm on 0000:00:0d.0-3-2.0 (Ethernet) [open]
[Link:down, TX:0 TXE:0 RX:0 RXE:0]
[Link status: Unknown (https://ipxe.org/1a086194)]
Waiting for link-up on net0... ok
Configuring (net0 8c:ae:4c:dd:3e:31)...... ok
net0: 192.168.199.254/255.255.255.0 gw 192.168.199.1
net0: fe0::8eae:4cff:fedd:3e31/64
Next server: 192.168.199.2
Filename: pxegrub
tftp://192.168.199.2/pxegrub... ok
pxegrub : 139032 bytes
Could not boot image: Exec format error (https://ipxe.org/2e008081)
No more network devices
FS0\>
It appears that iPXE is unhappy with the format of pxeboot
and pxegrub
. Perhaps the EFI boot expects EFI format files?
Does OmniOS have drivers for USB CDC-NCM or CDC-ECM devices?
Perhaps the EFI boot expects EFI format files?
Built a simple x86_64 UEFI application and replaced pxeboot
with my-uefi-app.efi
. When PXE booted using either Intel I225-LM or Pluggable USBC-E2500 the expected "Hello world!" was displayed, followed by a 10 second pause. Suggests that an UEFI replacement for pxeboot
might be required ...
Thought I would see if it was possible to PXE boot the Plugable USBC-E2500 without using iPXE. Copied the appropriate UEFI UNDI Driver to \EFI\Boot\RtkUndiDxe.efi
on the EFI System Partition then updated the EFI variables (from within Ubuntu) to load it using:
root@topaz:~# efibootmgr --driver --disk /dev/nvme0n1 --part 1
No DriverOrder is set
root@topaz:~# efibootmgr --driver --disk /dev/nvme0n1 --part 1 --create --label 'USB 10/100/1G/2.5G LAN' --loader '\EFI\Boot\RtkUndiDxe.efi'
DriverOrder: 0000
Driver0000* USB 10/100/1G/2.5G LAN
root@topaz:~# efibootmgr --driver --disk /dev/nvme0n1 --part 1 --verbose
DriverOrder: 0000
Driver0000* USB 10/100/1G/2.5G LAN HD(1,GPT,bbeadd11-388c-48fd-ac8c-fca17f8cfce0,0x800,0x32000)/File(\EFI\Boot\RtkUndiDxe.efi)
root@topaz:~#
This loads the driver during boot but, unfortunately, it doesn't support PXE. The EFI Variables can be removed with:
root@topaz:~# efibootmgr --driver --disk /dev/nvme0n1 --part 1 --bootnum 0000 --delete-bootnum
No DriverOrder is set
root@topaz:~#
It took me longer than I expected to find out how to set and clear DriverOrder and Driver#### so I have recorded it here in the hope that it will help others avoid a similar search.
As pxeboot
is the wrong format for EFI, I looked at the .iso media contents. It contains boot/efiboot.img
which is a FAT filesystem image that contains efi/boot/bootx64.efi
. Tried using that as the filename and it went a little further. Here's the console output:
Consoles: EFI console
Command line arguments: loader64.efi
Image base: 0x51034000
EFI version: 2.80
EFI Firmware: American Megatrends (rev 5.25)
illumos/amd64 EFI loader, Revision 1.1
Load Path:
Load Device: PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x0)/MAC(A8a159D0E637,0x1)/IPv4(0.0.0.0)
BootCurrent: 0008
BootOrder: 0008[*] 0002 0000 0009 000a 000b 0007 0001
Can't find device by handle
Setting currdev to net0:
-
It seems that OmniOS Installation via PXE Boot may need updating for EFI systems.
As it appears that neither the Intel I225-LM nor Plugable USBC-E2500 have drivers for OmniOS I suspect I won't be able to investigate further with the limited amount of hardware I have available. I did try on a Hyper-V Generation 2 VM which did something broadly similar. It would be interesting to know if anyone can get further on supported hardware.
Screen capture from the PXE boot on the Hyper-V Generation 2 VM at the point where it sticks. I know that the install from the .iso media doesn't work either. However, it does get as far as loading unix so there might be some insights into why the PXE install fails.
Monitored the tftp requests and discovered that there were files missing from /tftpboot as populated by kayak. Here's the results after adding the missing files:
May/11/2023 15:47:24 tftp,debug requested file(binary): boot/loader64.efi access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): boot/loader64.efi access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/defaults/loader.conf access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/defaults/loader.conf access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/fonts.dir access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/fonts.dir access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/8x16.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/8x16.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/8x14.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/8x14.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/6x12.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/6x12.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/16x32.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/16x32.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/14x28.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/14x28.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/12x24.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/12x24.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/11x22.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/11x22.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/10x20.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/10x20.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/10x18.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/10x18.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/12x24.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/fonts/12x24.fnt access: allowed
May/11/2023 15:47:24 tftp,debug requested file(binary): //boot/forth/boot.4th access: denied
May/11/2023 15:47:26 tftp,debug requested file(binary): //boot/forth/boot.4th access: denied
May/11/2023 15:47:30 tftp,debug requested file(binary): //boot/forth/boot.4th access: denied
May/11/2023 15:47:36 tftp,debug requested file(binary): //boot/forth/boot.4th access: denied
It then looped requesting //boot/forth/boot.4th
. I was serving dhcp and tftp from my MikroTik router running RouterOS 7.9. Discovered that their tftp implementation has a nasty feature where it seems to assume that filenames are of the form name.extension. For example, miniroot.gz
and miniroot.gz.hash
will both deliver the contents of miniroot.gz
! I suspect this may have confused the installer ...
Changed to serving dhcp, tftp and http from an OmniOS VM. Much better! Both the Simply NUC Topaz 2 i7 and the Hyper-V Generation 2 VM get as far as starting the PXE Installer (see attached).
So it looks like adding the missing files to kayak will fix that part of the problem. It would also be good to change the comment in the first line of /usr/share/kayak/sample/000000000000.sample
to point to the current documentation (Maybe Kayak Client Configuration?).
Neither PXE installation attempts run to completion. I'll provide details in a later comment.
There were some .png
files missing, too. I copied /boot/*.png
to /tftpboot/boot
. Here's my latest list in the order requested:
boot/loader64.efi
boot/defaults/loader.conf
boot/fonts/fonts.dir
boot/fonts/8x16.fnt
boot/fonts/8x14.fnt
boot/fonts/6x12.fnt
boot/fonts/16x32.fnt
boot/fonts/14x28.fnt
boot/fonts/12x24.fnt
boot/fonts/11x22.fnt
boot/fonts/10x20.fnt
boot/fonts/10x18.fnt
boot/fonts/12x24.fnt
boot/forth/boot.4th (*)
boot/forth/boot.4th.gz (*)
boot/forth/boot.4th (*)
boot/loader.rc
boot/forth/loader.4th
boot/forth/support.4th
boot/forth/screen.4th
boot/forth/color.4th
boot/forth/delay.4th
boot/forth/check-password.4th
boot/forth/screen.4th
boot/forth/efi.4th
boot/forth/beadm.4th
boot/loader.rc.local (*)
boot/loader.rc.local.gz (*)
boot/loader.rc.local (*)
boot/solaris/bootenv.rc
boot/defaults/loader.conf
boot/loader.conf
boot/loader.conf.gz
boot/loader.conf
boot/loader.conf.local
boot/transient.conf (*)
boot/transient.conf.gz (*)
boot/transient.conf (*)
boot/forth/beastie.4th
boot/forth/menu.rc
boot/forth/version.4th
boot/forth/brand.4th
boot/forth/menu.4th
boot/forth/frames.4th
boot/forth/menu-commands.4th
boot/forth/menusets.4th
boot/forth/shortcuts.4th
boot/forth/logo-omnios.4th
boot/fonts/10x18.fnt
boot/fenix.png
boot/illumos-small.png
boot/forth/brand-omnios.4th
boot/ooce.png
boot/menu.lst (*)
boot/menu.lst.gz (*)
boot/menu.lst (*)
boot/menu.lst (*)
boot/menu.lst.gz (*)
boot/menu.lst (*)
boot/menu.rc.local (*)
boot/menu.rc.local.gz (*)
boot/menu.rc.local (*)
boot/platform/i86pc/kernel/amd64/unix
Those tagged with *()** are still missing which may be as expected?
Next hurdle. Failing to load /boot/platform/i86pc/kernel/amd64/unix
. Here's the console output for the Simply NUC Topaz 2 i7:
Loading /boot/platform/i86pc/kernel/amd64/unix...
failed to allocate 1319024760 bytes for staging area: 9
cant load file '/boot/platform/i86pc/kernel/amd64/unix': cannot allocate memory
And for the Hyper-V Generation 2 VM:
Loading /boot/platform/i86pc/kernel/amd64/unix...
failed to allocate 4092125864 bytes for staging area: 9
cant load file '/boot/platform/i86pc/kernel/amd64/unix': cannot allocate memory
Wild guess is that it is using an uninitalised variable. Looking at efi_loadaddr()
in illumos-omnios/usr/src/boot/efi/loader/copy.c
there is this snippet of code:
if (type == LOAD_ELF)
return (0); /* not supported */
Might be barking up the wrong tree, but the kernel appears to be an ELF 64-bit LSB executable.
Rebuilt /boot/loader64.efi
with these changes:
diff --git a/usr/src/boot/common/load_elf_obj.c b/usr/src/boot/common/load_elf_obj.c
index f32388e170..ca7a3eabf6 100644
--- a/usr/src/boot/common/load_elf_obj.c
+++ b/usr/src/boot/common/load_elf_obj.c
@@ -137,8 +137,10 @@ __elfN(obj_loadfile)(char *filename, u_int64_t dest,
goto oerr;
}
- if (archsw.arch_loadaddr != NULL)
+ if (archsw.arch_loadaddr != NULL) {
+ printf("Reached: %s %i\n", __FILE__, __LINE__);
dest = archsw.arch_loadaddr(LOAD_ELF, hdr, dest);
+ }
else
dest = roundup(dest, PAGE_SIZE);
diff --git a/usr/src/boot/efi/loader/copy.c b/usr/src/boot/efi/loader/copy.c
index 491c6787c6..18f8795e9b 100644
--- a/usr/src/boot/efi/loader/copy.c
+++ b/usr/src/boot/efi/loader/copy.c
@@ -172,8 +172,10 @@ efi_loadaddr(uint_t type, void *data, vm_offset_t addr)
if (addr == 0)
return (addr); /* nothing to do */
- if (type == LOAD_ELF)
+ if (type == LOAD_ELF) {
+ printf("Reached: %s %i\n", __FILE__, __LINE__);
return (0); /* not supported */
+ }
if (type == LOAD_MEM)
size = *(size_t *)data;
Neither "Reached" message was displayed. Looks like I was barking up the wrong tree.
Think I may have found a problem with the Multiboot2 header in omnios-r151046.unix. Here's the header extracted from the file:
file offset: 0x190 (400)
magic: 0xE85250D6 MULTIBOOT_HEADER_MAGIC
architecture: 0x00000000
header_length: 0x00000090
checksum: 0x17ADAE9A - Good!
tags:
type: 0x0001 MULTIBOOT_HEADER_TAG_INFORMATION_REQUEST
flags: 0x0000
size: 0x00000020
mbi_tag_types: [0x00000001; 0x00000003; 0x00000005; 0x00000006; 0x00000008; 0x00000004]
type: 0x0002 MULTIBOOT_HEADER_TAG_ADDRESS
flags: 0x0000
size: 0x00000018
mbi_tag_types: [0x00C00038; 0x00BFFEA8; 0x00000000; 0x00000000]
type: 0x0003 MULITBOOT_HEADER_TAG_ENTRY_ADDRESS
flags: 0x0000
size: 0x0000000C
mbi_tag_types: [0x00C00000]
padding: 0x0026748D
type: 0x0004 MULTIBOOT_HEADER_TAG_CONSOLE_FLAGS
flags: 0x0000
size: 0x0000000C
mbi_tag_types: [0x00000002]
padding: 0x00000000
type: 0x0005 MULTIBOOT_HEADER_TAG_FRAMEBUFFER
flags: 0x0000
size: 0x00000014
mbi_tag_types: [0x00000000; 0x00000000; 0x00000000]
padding: 0x00000000
type: 0x0006 MULTIBOOT_HEADER_TAG_MODULE_ALIGN
flags: 0x0000
size: 0x00000008
type: 0x0000 MULTIBOOT_HEADER_TAG_END
flags: 0x0000
size: 0x00000008
and here's the output of objdump -h omnios-r151046.unix
:
unix: file format elf64-x86-64-sol2
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00019a4c 0000000000c00000 0000000000c00000 00000158 2**0
CONTENTS, ALLOC, LOAD, DATA
1 .text 000de0e1 fffffffffb800000 0000000000400000 0001a000 2**12
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .dynamic 000001b0 fffffffffb8de0e8 00000000004de0e8 000f80e8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .hash 0000b4c0 fffffffffb8de298 00000000004de298 000f8298 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 00021e28 fffffffffb8e9758 00000000004e9758 00103758 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 000147e7 fffffffffb90b580 000000000050b580 00125580 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .SUNW_reloc 00019e30 fffffffffb91fd68 000000000051fd68 00139d68 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .rodata 0002f177 fffffffffb939bc0 0000000000539bc0 00153bc0 2**6
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 set_tsc_calibration_set 00000020 fffffffffb968d38 0000000000568d38 00182d38 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .data 00023e98 fffffffffbc00000 0000000000800000 00183000 2**12
CONTENTS, ALLOC, LOAD, DATA
10 .bss 000827a0 fffffffffbc24000 0000000000824000 001a7000 2**12
ALLOC
11 .note 00000020 0000000000000000 0000000000000000 001a6e98 2**2
CONTENTS, READONLY
12 .comment 00000032 0000000000000000 0000000000000000 001a6eb8 2**0
CONTENTS, READONLY
Looking at the MULTIBOOT_HEADER_TAG_ADDRESS entry the fields are:
+-------------------+
u16 | type = 2 | 0x0002
u16 | flags | 0x0000
u32 | size | 0x00000018
u32 | header_addr | 0x00C00038
u32 | load_addr | 0x00BFFEA8
u32 | load_end_addr | 0x00000000
u32 | bss_end_addr | 0x00000000
+-------------------+
The file offset of .data
is 0x158 and the file offset of the Multiboot2 header is 0x190, a difference of 0x38. header_addr
looks reasonable. From the description of load_addr
: _Contains the physical address of the beginning of the text segment. The offset in the OS image file at which to start loading is defined by the offset at which the header was found, minus (header_addr
- load_addr
)_. However, the address difference is not the expected 0x38 but 0x190!
This may not be the source of the problem but it looks like an error.
The immediate cause of the EFI_OUT_OF_RESOURCES
(9) is from this code in the function efi_loadaddr()
in the file boot/efi/loader/copy.c
:
$ pr -n -t boot/efi/loader/copy.c
[...]
178 if (type == LOAD_MEM)
179 size = *(size_t *)data;
180 else {
181 stat(data, &st);
182 size = st.st_size;
183 }
[...]
type
is LOAD_KERN
; the (unchecked!) call to stat()
fails; size
is set to an uninitialized value; if that value happens to be big enough then EFI_OUT_OF_RESOURCES
is the result.
More generally, I'm puzzled how this code is expected to load the kernel and why the size of the file containing the kernel is used. I need to take a closer look at the Multiboot2 specification.
The errno
from stat()
is EBUSY
, which appears to come from the function stat()
in the file boot/libsa/stat.c
. The call to open()
fails because the file is already open, detected in the function tftp_open()
in the file boot/libsa/tftp.c
.
I don't think that the value of st.st_size
would be available at this point as only enough of the kernel image file has been read to obtain the contents of the Multiboot2 header. Using tftp the whole file would have to be read to obtain its size.
Maybe a better approach would be to make the loader ELF aware? This is hinted at in the description of the address tag in 3.1.5 The address tag of Multiboot2 header in Multiboot2 Specification version 2.0: Note: This information does not need to be provided if the kernel image is in ELF format, but it must be provided if the image is in a.out format or in some other format. When the address tag is present it must be used in order to load the image, regardless of whether an ELF header is also present. Compliant boot loaders must be able to load images that are either in ELF format or contain the address tag embedded in the Multiboot2 header.
Additional tags may be required in the Multiboot2 header such as 3.1.8 EFI amd64 entry address tag of Multiboot2 header and 3.1.12 EFI boot services tag.
As an experiment, I have tried to use grub2 from Fedora Linux 38. This is what I tried from the grub2 interactive prompt:
grub> multiboot2 EFI/omnios/unix -B install_media=http://192.168.199.110/kayak/omnios-r151046.zfs.xz,install_config=http://192.168.199.110/kayak
grub> module2 EFI/omnios/miniroot.gz
grub> boot
For the Hyper-V Generation 2 VM the following was displayed on the console:
krtld: failed to open '-B'
krtld: bind_primary(): no relocation information found for module -B
krtld: error during initial load/link phase
krtld could neither locate or resolve symbols for:
-B
in the boot archive. Please verify that this file
matches what is found in the boot archive.
You may need to boot using the Solaris failsafe to fix this.
Unable to boot.
Press any key to reboot.
I'm not sure how kernel command line arguments are supposed to be passed in grub2 ...
PXE Boot fails on a Simply NUC Topaz 2 i7. Displays
NBP file downloaded successfully.
then drops through to the next boot option, without trying to get any further files using tftp. Bothpxeboot
andpxegrub
fail. Attached is a Wireshark capture forpxeboot
(the capture forpxegrub
is similar except for the packet count so is not attached). The server (192.168.88.1) and topaz (192.168.88.2) are connected back-to-back on a single cable.Possibly the same underlying problem as omnios-r151046-rc3.usb-dd fails on Simply NUC Topaz 2 i7.
I have successfully PXE booted Ubuntu Server 23.04 (Lunar Lobster) as far as the selection screen on the same configuration.
topaz.pcapng.gz