pcengines / coreboot

github mirror of coreboot.org's master repository
http://www.coreboot.org/
GNU General Public License v2.0
73 stars 11 forks source link

Changing boot-menu-wait breaks iPXE boot #181

Closed stapelberg closed 5 years ago

stapelberg commented 6 years ago

I did the following steps:

% git clone https://github.com/pcengines/pce-fw-builder.git
% ./build.sh release v4.6.9 apu2
% ./build.sh dev-build $PWD/release/coreboot apu2
% cp release/coreboot/build/coreboot.rom /media/sdc1/

On the apu2c4, I flashed the image using flashrom -w coreboot.rom -p internal. So far, everything works fine: the apu2c4 boots my custom-built coreboot and PXE-boots just fine.

Then, I go back and decrease the menu wait time from 6s to 2s:

% echo -ne "\xd0\x07\x0\x0\x0\x0\x0\x0" > ./release/coreboot/src/mainboard/pcengines/apu2/boot-menu-wait
% ./build.sh dev-build $PWD/release/coreboot apu2
% cp release/coreboot/build/coreboot.rom /media/sdc1/

After flashing the image like above, PXE boot no longer works: no DHCP packets are received on my PXE server.

Is this to be expected? I was trying to eliminate delays in the boot sequence, since I’d like to PXE-boot my machine as quickly as possible for quick recovery during development.

pietrushnic commented 6 years ago

@stapelberg this is not expected behavior. It should be possible to run iPXE without any delay to speed up things. Truly it looks like you need our RTE, we plan to start selling it soon and with it, you will have automation of whole recovery process. We use it daily for development.

Question is if you can try this option on anything else eg. QEMU?

@miczyg1 any idea what is going on here? If this is a regression with should have an automated test for that.

miczyg1 commented 6 years ago

@stapelberg @pietrushnic have a look at the commands below:

coreboot/build$ ./cbfstool coreboot.rom remove -n etc/boot-menu-wait
coreboot/build$ ./cbfstool coreboot.rom add-int -i 2000 -n etc/boot-menu-wait
coreboot/build$ ./cbfstool coreboot.rom extract -n etc/boot-menu-wait -f boot-menu-wait

line 1: remove the 6sec boot-menu-wait from image line 2: add 2sec boot-menu-wait to image line 3: extract 2sec boot-menu-wait file from image without removing it and save it as boot-menu-wait in current directory

Now:

coreboot/build$ hexdump boot-menu-wait 
0000000 07d0 0000 0000 0000                    
0000008
coreboot/build$ hexdump ../src/mainboard/pcengines/apu2/boot-menu-wait 
0000000 1770 0000 0000 0000                    
0000008
coreboot/build$ echo -n "\xd0\x07\x0\x0\x0\x0\x0\x0" > custom-boot-menu-wait
coreboot/build$ hexdump custom-boot-menu-wait
0000000 785c 3064 785c 3730 785c 5c30 3078 785c
0000010 5c30 3078 785c 5c30 3078               
000001a

07d0 -> 2000 decimal (time in milisec) 1770 -> 6000 decimal 785c 3064 785c 3730 785c 5c30 3078 785c 5c30 3078 785c 5c30 3078 -> ???

See the difference? File created by echo contains garbage. However i was able to PXE boot Debian with stable kernel 4.14.33 using each of the three files.

@stapelberg did You try official release binaries? If there is no problem with them, it is probably some small mistake in Your build flow.

When modifying runtime config please use cbfstool, much safer approach. This file may also interest You.

stapelberg commented 6 years ago

Thanks for looking into this.

The echo command I provided is zsh-specific (sorry, wasn’t aware). For other shells, add the -e flag: echo -ne "\xd0\x07\x0\x0\x0\x0\x0\x0" > ./release/coreboot/src/mainboard/pcengines/apu2/boot-menu-wait

Note: the echo command will produce 0xd007, while the cbfstool command produces 0x07d0.

When using the cbfstool output (0x07d0), my apu2c4 hangs for about a minute at the F10 prompt before proceeding with the boot — that’s clearly not the 2 seconds I’m looking for :). I’m assuming the value is invalid, and the counter is capped or overflows at some point.

Using the echo output (0xd007), my apu2c4 proceeds after 2 seconds, but won’t obtain a DHCP configuration when PXE-booting.

Here are the steps I used to test my build:

Then, I changed the menutime (and nothing else) and repeated:

You can find my build at http://t.zekjur.net/apu2_v4.6.9_menutime_broken.rom — you should be able to reproduce it by doing the echo step as I described.

miczyg1 commented 6 years ago

@stapelberg the way You build iPXE is whole different than we utilize. We do embed custom configs and menu files into the iPXE ROM which allows us to use iPXE shell. Sometimes it happens that dhcp command ends with No configuration methods succeded.

I can see that You made some shortcuts to decrease boot time. I need to know what changes have You made to the coreboot tree, at least the ones related to iPXE, to help You further

stapelberg commented 6 years ago

Thanks for your reply.

The change to the boot-menu-wait file is the only change I’ve made to the tree (I reproduced this issue independently of any other changes I tried). The steps to reproduce are still what I outlined in the first comment of this issue.

miczyg1 commented 6 years ago

@stapelberg ok i went through whole procedure as You described in first comment. I pulled pce-fw-builder current master. Binary produced by me for apu2 based on v4.6.9 available here

One thing i had to do was to press Enter during build configuration to select option

Include CPU microcode in CBFS
  1. Generate from tree (CPU_MICROCODE_CBFS_GENERATE)
  2. Include external microcode header files (CPU_MICROCODE_CBFS_EXTERNAL_HEADER)
> 3. Do not include microcode updates (CPU_MICROCODE_CBFS_NONE)
  4. Add raw microcode binary to CBFS (CPU_UCODE_RAW_BINARY) (NEW)
choice[1-4]:

I was able to boot Debian via PXE without problems. Try using my binary and repeat Your procedure using pce-fw-builder master branch.

stapelberg commented 6 years ago

Sorry for the late reply, I wasn’t in a position to test new firmware for the last couple of days.

I gave your ROM a shot:

Unfortunately, netboot does not work for me with your ROM :(

This is the output I’m getting on the serial interface:

PC Engines apu2
coreboot build 20180511
BIOS version v4.6.9
4080 MB ECC DRAM

SeaBIOS (version rel-1.11.0.4-0-gfcbc9d7)

Press F10 key now for boot menu, N for PXE boot
Booting from ROM...
iPXE starting execution...ok
iPXE initialising devices...ok

iPXE 1.0.0+ (fd6d1)
 -- Open Source Network Boot Firmware -- 
http://ipxe.org
Features: DNS FTP HTTP HTTPS iSCSI NFS SLAM TFTP VLAN AoE ELF MBOOT NBI PXE SDI 
bzImage COMBOOT Menu PXEXT

<ASCII boot menu omitted>

net0: 00:0d:b9:4c:8e:c4 using i210-2 on 0000:01:00.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 00:0d:b9:4c:8e:c4).................. No configuration methods 
succeeded (http://ipxe.org/040ee119)
Booting from Hard Disk...

Just like in my older comment https://github.com/pcengines/coreboot/issues/181#issuecomment-397348741, directly flashing v4.6.9 does not fix netboot. Instead, I had to flash v4.6.1 (netboot works again) and then flash v4.6.9 (netboot keeps working).

I also noticed that I needed to specify boardmismatch=force when running flashrom to upgrade from the stock firmware to your coreboot.rom or v4.6.9:

This coreboot image (PC Engines:apu2) does not appear to
be correct for the detected mainboard (PC Engines:PC Engines apu2).

I wonder if this could be related somehow?

PS: Just to be extra sure, I flashed v4.6.1 (netboot works), then flashed your coreboot.rom (netboot breaks), to make sure my testing procedure works.

miczyg1 commented 6 years ago

@stapelberg to clear things out:

  1. The board mismatch is caused by the board name change in SMBIOS tables (it was set incorrectly in older releases). Flashing with internal programmer will yield error when the names are different.
  2. iPXE dhcp confiuration happen to fail. The autoboot feature attempts to configure the interface only once and assumes that dhcp command does not fail. If You would use ixpe shell and try dhcp netx where x is the ethernet port number counting from 0, it should success after first or second try. It may be somehow related to dhcp timeout or similar issue.

We use only ipxe shell because we have many types of bootpaths and systems to validate, we cannot rely on a single dhcp server giving single rootpath option to boot.

We will try to eliminate this little inefficiency, but if we will come with any fix, it would be in version 4.8.0.3.

stapelberg commented 6 years ago

Thanks for the details. I’ll play around with the ipxe shell over the weekend and see if I can gather any additional details.

stapelberg commented 6 years ago

I found out that pressing F10 for the boot menu and waiting for a few seconds before selecting iPXE makes it work. The problem here really is timing.

Additionally, I realized that I have a visual indicator of when things will work: net0’s green LED always lights up, but for DHCP packets to be sent by iPXE, net0’s orange LED must also be on when starting iPXE.

Interestingly enough, when not waiting for the orange LED to light up and entering the iPXE shell, running dhcp will fail (No configuration methods after the timeout), but then net0’s orange LED lights up, and running dhcp again will succeed.

My guess is that iPXE doesn’t fully/correctly (re-)initialize the network interface.

I also wonder whether auto-negotiation is in any way related. Perhaps that’s a difference in our setup, and perhaps that’s why you can’t reproduce the issue easily? My apu2c4 is connected to a Netgear GS105E-200PES switch.

paultech commented 6 years ago

The does sound like auto-negotiation/spanning-tree timing issue.

Looking at ipxe code shows the dhcp process should be stalled until link-up state.


    printf ( "Waiting for link-up on %s", netdev->name );
    return ifpoller_wait ( netdev, NULL, timeout, iflinkwait_progress );```
pietrushnic commented 6 years ago

@miczyg1 this issue is dead for a long time. Did anyone try to reproduce that?

miczyg1 commented 6 years ago

Tried to reproduce. using ROM delivered by @stapelberg reproducibility is around 50% (clean hardware, no medias connected). For newer BIOSes the reproduction chance is lower, since SeaBIOS is newer with longer timeouts for USB. Network controllers have more time to wake up and be initialized, so issue is almost not present. It also depends on connected boot medias (if no boot medias, the boot menu string appears earlier).

I observed that when link is not yet up when issuing autoboot, the DHCP configuration will fail. @stapelberg Your conclusions were correct I guess, the LED should be on. My advise would be to not press that much on boot time in order to let NICs wake up. Maybe going from 2s to 3s on SeaBIOS boot-menu-wait will be sufficient.

miczyg1 commented 5 years ago

@stapelberg found a solution for this issue. In v4.9.0.7 and v4.0.27 release, we have disabled the IPv6 in the iPXE ROM. Now NICs obtain DHCP configuration almost instantly. Reducing the boot menu timeout to 2 seconds shouldn't be a problem now. The IPv6 configuration increased the autoboot/dhcp command time required to succeed it seems. Please test the new binaries (already on pcengines.github.io) and if this is the solution You were looking for, then close the issue.

stapelberg commented 5 years ago

Thank you so much for the update!

I just boarded a train a few minutes ago, so I’ll get back to you on this next week.

stapelberg commented 5 years ago

I can confirm that using the following commands to modify the apu2_v4.9.0.7.rom image:

coreboot/build$ ./cbfstool apu2_v4.9.0.7.rom remove -n etc/boot-menu-wait
coreboot/build$ ./cbfstool apu2_v4.9.0.7.rom add-int -i 2000 -n etc/boot-menu-wait

…indeed results in a 2-second wait and working PXE boot!

Thank you so much!