raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.18k stars 1.68k forks source link

PXE Boot doesn't ask for bootcode.bin #862

Open andig opened 7 years ago

andig commented 7 years ago

Similar to https://github.com/raspberrypi/firmware/issues/764, originally reported in forums before I found my way here. I've followed tutorial, main difference that a fritzbox is running as router in the local network and serving dhcp. rpi-update has been run, didn't try BRANCH=next yet.

tcpdump:

11:13:12.397988 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:27:eb:9d:c1:5a (oui Unknown), length 320
11:13:12.401547 IP keller.fritz.box.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
11:13:26.028379 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:27:eb:9d:c1:5a (oui Unknown), length 320
11:13:26.030140 IP keller.fritz.box.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
11:13:31.028659 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:27:eb:9d:c1:5a (oui Unknown), length 320
11:13:31.030341 IP keller.fritz.box.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
11:13:34.257512 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b8:27:eb:9d:c1:5a (oui Unknown), length 320
11:13:34.259085 IP keller.fritz.box.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326

daemon log:

Aug 27 11:12:22 keller dhcpcd[1390]: eth0: Router Advertisement from fe80::ca0e:14ff:feaa:856a
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 available DHCP subnet: 192.168.0.255/255.255.255.0
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 vendor class: PXEClient:Arch:00000:UNDI:002001
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 PXE(eth0) b8:27:eb:9d:c1:5a proxy
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 tags: eth0
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 broadcast response
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 sent size:  1 option: 53 message-type  2
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 sent size:  4 option: 54 server-identifier  192.168.0.48
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 sent size: 17 option: 97 client-machine-id  00:44:44:44:44:44:44:44:44:44:44:44:44:44...
Aug 27 11:13:12 keller dnsmasq-dhcp[4625]: 653460281 sent size: 44 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 available DHCP subnet: 192.168.0.255/255.255.255.0
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 vendor class: PXEClient:Arch:00000:UNDI:002001
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 PXE(eth0) b8:27:eb:9d:c1:5a proxy
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 tags: eth0
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 broadcast response
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 sent size:  1 option: 53 message-type  2
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 sent size:  4 option: 54 server-identifier  192.168.0.48
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 sent size: 17 option: 97 client-machine-id  00:44:44:44:44:44:44:44:44:44:44:44:44:44...
Aug 27 11:13:26 keller dnsmasq-dhcp[4625]: 653460281 sent size: 44 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...

dnsmasq.conf:

resolv-file=/etc/resolv.conf.dnsmasq 
port=0
dhcp-range=192.168.0.255,proxy
log-dhcp
enable-tftp
tftp-root=/tftpboot
pxe-service=0,"Raspberry Pi Boot   ",192.168.0.48

The " " and appended pxe server IP were added after looking for potential solutions.

Is there any way to get this working without need for as SD card?

UPDATE Playing with dhcp-reply-delay from 1 to 5 seconds didn't help any.

UPDATE I've also tried with an SD card containing but a single FAT32 partition with just BOOTCODE.BIN. This does go into TFTP but fails with Unable to mount root fs on unknown-block(2,0), but that's a separate topic.

andig commented 7 years ago

I've done a wireshare here https://pastebin.com/ju4hrEa6 that should show whats going on. It's limited to the packets participating in the conversion. Can upload full version if helpful.

All help would be greatly appreciated. Without SD card I'm unable to get the PI to boot from network.

andig commented 7 years ago

ping @ghollingworth might this be something you could help with?

andig commented 7 years ago

With help from the dnsmasq mailing list I've finally managed to boot the raspi without SD card/ the help of BOOTCODE.BIN. It seems that my FritzBox router is not replying to the raspi's (potentially malformed) DHCP request. As workaround I've used dnsmasq to serve the IP:

resolv-file=/etc/resolv.conf.dnsmasq

# DNS off
port=0

# DHCP on, serve static
dhcp-range=192.168.0.255,static
dhcp-host=b8:xx:xx:xx:xx,192.168.0.66
dhcp-reply-delay=1

# TFTP on
enable-tftp
tftp-root=/tftpboot

# PXE on
pxe-service=0,"Raspberry Pi Boot"

With this setup the raspi is finally net-booting.

ghollingworth commented 7 years ago

Looking at your previous wireshark response there is no IP address served to the device. You are using a proxy DHCP reply to provide the pxe-service but there is no standard DHCP reply from your router.

Previously you had "dhcp-range=192.168.0.255,proxy" which means the dnsmasq will not give the device an IP address (it is assuming a separate DHCP server is going to do this for you). It only serves a DHCP offer that contains the Option 43 with the client IP address of 0.0.0.0

What you've now done is to get your fritzbox to actually serve IP addresses as well so your DHCP response should now have both the Option 43 and the client IP address in there.

Why do you think the DHCP request is malformed, we've never found a problem with getting a DHCP server to actually serve it addresses.

andig commented 7 years ago

Looking at your previous wireshark response there is no IP address served to the device. You are using a proxy DHCP reply to provide the pxe-service but there is no standard DHCP reply from your router.

Right.

Previously you had "dhcp-range=192.168.0.255,proxy" which means the dnsmasq will not give the device an IP address (it is assuming a separate DHCP server is going to do this for you). It only serves a DHCP offer that contains the Option 43 with the client IP address of 0.0.0.0

Correct.

This setup is working fine if the SD with bootcode is in place. Combined with the fact that not a single other device has had problems with the router that makes me think that the router reply is per se fine, potentially the raspi request is different with or without bootcode.

What you've now done is to get your fritzbox to actually serve IP addresses as well so your DHCP response should now have both the Option 43 and the client IP address in there.

Imho no. I've set my server raspi to serve the IP instead of the fritzbox router?

Why do you think the DHCP request is malformed, we've never found a problem with getting a DHCP server to actually serve it addresses.

I'm only guessing. But if the client raspi receives an IP from the router with bootcode but not without then this should not depend on the router DHCP reply (which should not change), unless bootcode is already doing things for working with "malformed" replies or similar?

ghollingworth commented 7 years ago

Looking at your dump again, I can't see the reply from the router. I assume this is because you've got a switch between the router and the client and you are tcpdump'ing from the server...

Would you have a managed switch that you can use between the two to enable all traffic get routed to the server, then I can understand what it is that is causing it to fail. It's clearly been fixed in bootcode.bin but I'm not sure what it is that is fixed!

Gordon

andig commented 7 years ago

Looking at your dump again, I can't see the reply from the router. I assume this is because you've got a switch between the router and the client and you are tcpdump'ing from the server...

Both client and server are directly attached to the router.

Would you have a managed switch that you can use between the two to enable all traffic get routed to the server.

Theoretically the router should support packet capture but apparently mine doesn't. Since I'm aiming for replacement anyway I'll try to get the replacement and provide packet capture.

From the wireshark sessions I still have the DHCP request around that didn't get an answer:

0.0.0.0 255.255.255.255 DHCP    362 DHCP Discover - Transaction ID 0x26f30339

0000   ff ff ff ff ff ff b8 27 eb 9d c1 5a 08 00 45 00  .......'...Z..E.
0010   01 5c 00 00 00 00 80 11 39 92 00 00 00 00 ff ff  .\......9.......
0020   ff ff 00 44 00 43 01 48 00 00 01 01 06 00 26 f3  ...D.C.H......&.
0030   03 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .9..............
0040   00 00 00 00 00 00 b8 27 eb 9d c1 5a 00 00 00 00  .......'...Z....
0050   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0060   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0070   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0080   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0090   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00a0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00b0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00c0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00d0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00e0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00f0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0100   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0110   00 00 00 00 00 00 63 82 53 63 35 01 01 37 0c 2b  ......c.Sc5..7.+
0120   3c 43 80 81 82 83 84 85 86 87 42 5d 02 00 00 5e  <C........B]...^
0130   03 01 02 01 61 11 00 44 44 44 44 44 44 44 44 44  ....a..DDDDDDDDD
0140   44 44 44 44 44 44 44 3c 20 50 58 45 43 6c 69 65  DDDDDDD< PXEClie
0150   6e 74 3a 41 72 63 68 3a 30 30 30 30 30 3a 55 4e  nt:Arch:00000:UN
0160   44 49 3a 30 30 32 30 30 31 ff                    DI:002001.

Frame 40: 362 bytes on wire (2896 bits), 362 bytes captured (2896 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Sep  2, 2017 17:42:46.063789000 CEST
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1504366966.063789000 seconds
    [Time delta from previous captured frame: 0.000142000 seconds]
    [Time delta from previous displayed frame: 0.000142000 seconds]
    [Time since reference or first frame: 27.995098000 seconds]
    Frame Number: 40
    Frame Length: 362 bytes (2896 bits)
    Capture Length: 362 bytes (2896 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:udp:bootp]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]
Ethernet II, Src: Raspberr_9d:c1:5a (b8:27:eb:9d:c1:5a), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
    Destination: Broadcast (ff:ff:ff:ff:ff:ff)
    Source: Raspberr_9d:c1:5a (b8:27:eb:9d:c1:5a)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 0.0.0.0, Dst: 255.255.255.255
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 348
    Identification: 0x0000 (0)
    Flags: 0x00
    Fragment offset: 0
    Time to live: 128
    Protocol: UDP (17)
    Header checksum: 0x3992 [validation disabled]
    Source: 0.0.0.0
    Destination: 255.255.255.255
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: 68, Dst Port: 67
    Source Port: 68
    Destination Port: 67
    Length: 328
    Checksum: 0x0000 (none)
        [Good Checksum: False]
        [Bad Checksum: False]
    [Stream index: 7]
Bootstrap Protocol (Discover)
    Message type: Boot Request (1)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0x26f30339
    Seconds elapsed: 0
    Bootp flags: 0x0000 (Unicast)
    Client IP address: 0.0.0.0
    Your (client) IP address: 0.0.0.0
    Next server IP address: 0.0.0.0
    Relay agent IP address: 0.0.0.0
    Client MAC address: Raspberr_9d:c1:5a (b8:27:eb:9d:c1:5a)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Discover)
    Option: (55) Parameter Request List
    Option: (93) Client System Architecture
    Option: (94) Client Network Device Interface
    Option: (97) UUID/GUID-based Client Identifier
    Option: (60) Vendor class identifier
    Option: (255) End
hlev commented 7 years ago

Is a permanent resolution for the "diskless PXE" issue possible at all on the RPi3 or this root-cause is within the boot ROM itself and will remain as is? I think my TFTP server and network setup are fine, but I also could not make it reliably work. The occasional ping -b <broadcast_ip> solves it, without that it only works ~30% of the time.

A recent bootcode.bin from https://github.com/raspberrypi/firmware/tree/2669578d1449255edf23f38ed98d208ab73faed7 on the SD card is reliable too. It'd be so nice to spare a card though.

andig commented 7 years ago

@hlev The solution presented above does work reliably without SD now. The key to success is to make sure that the raspi actually receives the IP which it didn't- for not entirely clear reasons- from my FritzBox.

hlev commented 7 years ago

@andig thanks, good to know it works, it seems the trick is the dhcp-reply-delay

TL;DR I originally experimented with dnsmasq 2.76 which did not have this option yet, installed 2.78 and all is well.

My first "diskless" plan was, despite my router supports BOOTP/TFTP, that I'd simply use it to relay BOOTP DHCPREQUEST packets to my laptop running dnsmasq (2.76). It worked up to the point that the relay did its job and sent back the DHCPOFFER packet produced by dnsmasq to the Pi. The options may not have been 100% right though. The Pi did not take the IP.

Then I simplified this setup by putting the Pi and the laptop running dnsmasq behind an unmanaged switch downstream from the router, so the router was not involved at all. With 2.76 it worked intermittently, but with 2.78 and dhcp-reply-delay=1 it is stable.

JamesH65 commented 5 years ago

Can any participents comment on the status of this issue?

This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.

jibanes commented 5 years ago

is there a workaround for isc dhcp 4.1.1?

JamesH65 commented 5 years ago

Are you running the very latest raspbian and bootcode, there have been changes.

jibanes commented 5 years ago

I will check momentarily

jibanes commented 5 years ago

Hello, I was able to boot, I suppose it successfully completed the dhcp handshake, retrieved the kernel and netbooted, then got stuck there: https://www.eskimo.com/~jibanes/pi/IMG-0852.JPG I've checked https://github.com/raspberrypi/linux/blob/rpi-4.14.y/arch/arm/configs/bcm2709_defconfig which seem to have CONFIG_ROOT_NFS set to y by default, so it should have been able to mount the root filesystem. Although I've noticed the RPI-3 doesn't ping, what could be the culprit?

jibanes commented 5 years ago

I found the issue, cmdline.txt doesn't need to be at the root of the tftpboot path, it needs to be (in my case) in a directory called b4c1710b, does anyone know what this represent? (it works though, I found out about this path with tcpdump)

hlev commented 5 years ago

@jibanes The bootloader will attempt to fetch the files from <tftproot>/<device_serial> first, then fallback to <tftproot> if the first path fails. This allows for customising the configuration of devices behind the same TFTP server. It should work with both paths, consistently.

jibanes commented 5 years ago

thanks, works great!

jibanes commented 5 years ago

After many tries, I still get a significant number of failures, sometimes it boots, sometimes not, often not; a reboot command, would halt linux, but fail to reboot; it seems to be indifferent whereas I have hdmi/usb connectors plugged in; it's random. It could be that my PI3 is defective, has anyone else seen this behavior? I'm running the latest firmware according to rpi-update.

hlev commented 5 years ago

@jibanes If the devices fails at the TFTP stage by not even accepting the IP from the DHCP/TFTP server, I recommend you read this thread https://github.com/raspberrypi/firmware/issues/894 which explores and explains why the Pi sometimes ignores the DHCPOFFER from the TFTP server. Also why it will likely not be fixed by further firmware releases, because it is a bug in the first stage of the bootloader in ROM. It can be circumvented by creating random packages that will "nudge" the Pi to proceed with booting.

If boot fails at an arbitrary place while downloading assets over TFTP, I would suggest trying a different switch between the Pi and the server (if you use one)

If the device fails when booting into Linux, you can debug that further on the serial port.

If the device fails after a successful boot into Linux then it could be a lot of things. Let me know if you need pointers in debugging either scenario.

hlev commented 5 years ago

@jibanes If you want to test quickly and you are on a Linux host:

When you suspect the Pi is failing to boot, just send a broadcast or multicast packet from a host on the same physical LAN. The IP/port/content does not matter as long as it is a broadcast or a multicast packet. For example:

echo "hello" | socat - udp4-datagram:224.0.0.100:24000

If you see that this reliably nudges the Pi into booting, then just send this packet periodically and you're done.

jibanes commented 5 years ago

very useful hlev, thank you, I'll explore this; it seems to be somewhat nondeterministic but from my tcpdumps it's not even making the dhcp request. I will also send the multicast packet from cronjob, I'll post results here, many thanks.

hlev commented 5 years ago

Sure, welcome. I suggest higher recurrence rate than cron though, a simple loop with a 3 second sleep for example.

One reason for this being nondeterministic is that on busy networks there is a good chance that at any point in time an ARP broadcast, or DHCP traffic or other unrelated traffic will be present and those packets nudge the Pi out of the buggy loop mentioned in that other thread and it just works.

On less busy networks, according to my experience, this self-generated traffic reliably fixes the issue and if you have only a few other hosts on the network you can very clearly see in the traffic dump that the Pi makes the initial bootp request, fails to act on the DHCPOFFER but continues as soon as your packet reaches its interface.