Open andig opened 7 years ago
I've done a wireshare here https://pastebin.com/ju4hrEa6 that should show whats going on. It's limited to the packets participating in the conversion. Can upload full version if helpful.
All help would be greatly appreciated. Without SD card I'm unable to get the PI to boot from network.
ping @ghollingworth might this be something you could help with?
With help from the dnsmasq mailing list I've finally managed to boot the raspi without SD card/ the help of BOOTCODE.BIN
. It seems that my FritzBox router is not replying to the raspi's (potentially malformed) DHCP request. As workaround I've used dnsmasq
to serve the IP:
resolv-file=/etc/resolv.conf.dnsmasq
# DNS off
port=0
# DHCP on, serve static
dhcp-range=192.168.0.255,static
dhcp-host=b8:xx:xx:xx:xx,192.168.0.66
dhcp-reply-delay=1
# TFTP on
enable-tftp
tftp-root=/tftpboot
# PXE on
pxe-service=0,"Raspberry Pi Boot"
With this setup the raspi is finally net-booting.
Looking at your previous wireshark response there is no IP address served to the device. You are using a proxy DHCP reply to provide the pxe-service but there is no standard DHCP reply from your router.
Previously you had "dhcp-range=192.168.0.255,proxy" which means the dnsmasq will not give the device an IP address (it is assuming a separate DHCP server is going to do this for you). It only serves a DHCP offer that contains the Option 43 with the client IP address of 0.0.0.0
What you've now done is to get your fritzbox to actually serve IP addresses as well so your DHCP response should now have both the Option 43 and the client IP address in there.
Why do you think the DHCP request is malformed, we've never found a problem with getting a DHCP server to actually serve it addresses.
Looking at your previous wireshark response there is no IP address served to the device. You are using a proxy DHCP reply to provide the pxe-service but there is no standard DHCP reply from your router.
Right.
Previously you had "dhcp-range=192.168.0.255,proxy" which means the dnsmasq will not give the device an IP address (it is assuming a separate DHCP server is going to do this for you). It only serves a DHCP offer that contains the Option 43 with the client IP address of 0.0.0.0
Correct.
This setup is working fine if the SD with bootcode is in place. Combined with the fact that not a single other device has had problems with the router that makes me think that the router reply is per se fine, potentially the raspi request is different with or without bootcode.
What you've now done is to get your fritzbox to actually serve IP addresses as well so your DHCP response should now have both the Option 43 and the client IP address in there.
Imho no. I've set my server raspi to serve the IP instead of the fritzbox router?
Why do you think the DHCP request is malformed, we've never found a problem with getting a DHCP server to actually serve it addresses.
I'm only guessing. But if the client raspi receives an IP from the router with bootcode but not without then this should not depend on the router DHCP reply (which should not change), unless bootcode is already doing things for working with "malformed" replies or similar?
Looking at your dump again, I can't see the reply from the router. I assume this is because you've got a switch between the router and the client and you are tcpdump'ing from the server...
Would you have a managed switch that you can use between the two to enable all traffic get routed to the server, then I can understand what it is that is causing it to fail. It's clearly been fixed in bootcode.bin but I'm not sure what it is that is fixed!
Gordon
Looking at your dump again, I can't see the reply from the router. I assume this is because you've got a switch between the router and the client and you are tcpdump'ing from the server...
Both client and server are directly attached to the router.
Would you have a managed switch that you can use between the two to enable all traffic get routed to the server.
Theoretically the router should support packet capture but apparently mine doesn't. Since I'm aiming for replacement anyway I'll try to get the replacement and provide packet capture.
From the wireshark sessions I still have the DHCP request around that didn't get an answer:
0.0.0.0 255.255.255.255 DHCP 362 DHCP Discover - Transaction ID 0x26f30339
0000 ff ff ff ff ff ff b8 27 eb 9d c1 5a 08 00 45 00 .......'...Z..E.
0010 01 5c 00 00 00 00 80 11 39 92 00 00 00 00 ff ff .\......9.......
0020 ff ff 00 44 00 43 01 48 00 00 01 01 06 00 26 f3 ...D.C.H......&.
0030 03 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .9..............
0040 00 00 00 00 00 00 b8 27 eb 9d c1 5a 00 00 00 00 .......'...Z....
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0110 00 00 00 00 00 00 63 82 53 63 35 01 01 37 0c 2b ......c.Sc5..7.+
0120 3c 43 80 81 82 83 84 85 86 87 42 5d 02 00 00 5e <C........B]...^
0130 03 01 02 01 61 11 00 44 44 44 44 44 44 44 44 44 ....a..DDDDDDDDD
0140 44 44 44 44 44 44 44 3c 20 50 58 45 43 6c 69 65 DDDDDDD< PXEClie
0150 6e 74 3a 41 72 63 68 3a 30 30 30 30 30 3a 55 4e nt:Arch:00000:UN
0160 44 49 3a 30 30 32 30 30 31 ff DI:002001.
Frame 40: 362 bytes on wire (2896 bits), 362 bytes captured (2896 bits)
Encapsulation type: Ethernet (1)
Arrival Time: Sep 2, 2017 17:42:46.063789000 CEST
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1504366966.063789000 seconds
[Time delta from previous captured frame: 0.000142000 seconds]
[Time delta from previous displayed frame: 0.000142000 seconds]
[Time since reference or first frame: 27.995098000 seconds]
Frame Number: 40
Frame Length: 362 bytes (2896 bits)
Capture Length: 362 bytes (2896 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ethertype:ip:udp:bootp]
[Coloring Rule Name: UDP]
[Coloring Rule String: udp]
Ethernet II, Src: Raspberr_9d:c1:5a (b8:27:eb:9d:c1:5a), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Destination: Broadcast (ff:ff:ff:ff:ff:ff)
Source: Raspberr_9d:c1:5a (b8:27:eb:9d:c1:5a)
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 0.0.0.0, Dst: 255.255.255.255
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
Total Length: 348
Identification: 0x0000 (0)
Flags: 0x00
Fragment offset: 0
Time to live: 128
Protocol: UDP (17)
Header checksum: 0x3992 [validation disabled]
Source: 0.0.0.0
Destination: 255.255.255.255
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: 68, Dst Port: 67
Source Port: 68
Destination Port: 67
Length: 328
Checksum: 0x0000 (none)
[Good Checksum: False]
[Bad Checksum: False]
[Stream index: 7]
Bootstrap Protocol (Discover)
Message type: Boot Request (1)
Hardware type: Ethernet (0x01)
Hardware address length: 6
Hops: 0
Transaction ID: 0x26f30339
Seconds elapsed: 0
Bootp flags: 0x0000 (Unicast)
Client IP address: 0.0.0.0
Your (client) IP address: 0.0.0.0
Next server IP address: 0.0.0.0
Relay agent IP address: 0.0.0.0
Client MAC address: Raspberr_9d:c1:5a (b8:27:eb:9d:c1:5a)
Client hardware address padding: 00000000000000000000
Server host name not given
Boot file name not given
Magic cookie: DHCP
Option: (53) DHCP Message Type (Discover)
Option: (55) Parameter Request List
Option: (93) Client System Architecture
Option: (94) Client Network Device Interface
Option: (97) UUID/GUID-based Client Identifier
Option: (60) Vendor class identifier
Option: (255) End
Is a permanent resolution for the "diskless PXE" issue possible at all on the RPi3 or this root-cause is within the boot ROM itself and will remain as is?
I think my TFTP server and network setup are fine, but I also could not make it reliably work. The occasional ping -b <broadcast_ip>
solves it, without that it only works ~30% of the time.
A recent bootcode.bin
from https://github.com/raspberrypi/firmware/tree/2669578d1449255edf23f38ed98d208ab73faed7 on the SD card is reliable too. It'd be so nice to spare a card though.
@hlev The solution presented above does work reliably without SD now. The key to success is to make sure that the raspi actually receives the IP which it didn't- for not entirely clear reasons- from my FritzBox.
@andig thanks, good to know it works, it seems the trick is the dhcp-reply-delay
TL;DR I originally experimented with dnsmasq
2.76 which did not have this option yet, installed 2.78 and all is well.
My first "diskless" plan was, despite my router supports BOOTP/TFTP, that I'd simply use it to relay BOOTP DHCPREQUEST packets to my laptop running dnsmasq
(2.76). It worked up to the point that the relay did its job and sent back the DHCPOFFER packet produced by dnsmasq
to the Pi. The options may not have been 100% right though. The Pi did not take the IP.
Then I simplified this setup by putting the Pi and the laptop running dnsmasq
behind an unmanaged switch downstream from the router, so the router was not involved at all. With 2.76 it worked intermittently, but with 2.78 and dhcp-reply-delay=1
it is stable.
Can any participents comment on the status of this issue?
This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.
is there a workaround for isc dhcp 4.1.1?
Are you running the very latest raspbian and bootcode, there have been changes.
I will check momentarily
Hello, I was able to boot, I suppose it successfully completed the dhcp handshake, retrieved the kernel and netbooted, then got stuck there: https://www.eskimo.com/~jibanes/pi/IMG-0852.JPG I've checked https://github.com/raspberrypi/linux/blob/rpi-4.14.y/arch/arm/configs/bcm2709_defconfig which seem to have CONFIG_ROOT_NFS set to y by default, so it should have been able to mount the root filesystem. Although I've noticed the RPI-3 doesn't ping, what could be the culprit?
I found the issue, cmdline.txt doesn't need to be at the root of the tftpboot path, it needs to be (in my case) in a directory called b4c1710b, does anyone know what this represent? (it works though, I found out about this path with tcpdump)
@jibanes The bootloader will attempt to fetch the files from <tftproot>/<device_serial>
first, then fallback to <tftproot>
if the first path fails. This allows for customising the configuration of devices behind the same TFTP server. It should work with both paths, consistently.
thanks, works great!
After many tries, I still get a significant number of failures, sometimes it boots, sometimes not, often not; a reboot command, would halt linux, but fail to reboot; it seems to be indifferent whereas I have hdmi/usb connectors plugged in; it's random. It could be that my PI3 is defective, has anyone else seen this behavior? I'm running the latest firmware according to rpi-update.
@jibanes If the devices fails at the TFTP stage by not even accepting the IP from the DHCP/TFTP server, I recommend you read this thread https://github.com/raspberrypi/firmware/issues/894 which explores and explains why the Pi sometimes ignores the DHCPOFFER from the TFTP server. Also why it will likely not be fixed by further firmware releases, because it is a bug in the first stage of the bootloader in ROM. It can be circumvented by creating random packages that will "nudge" the Pi to proceed with booting.
If boot fails at an arbitrary place while downloading assets over TFTP, I would suggest trying a different switch between the Pi and the server (if you use one)
If the device fails when booting into Linux, you can debug that further on the serial port.
If the device fails after a successful boot into Linux then it could be a lot of things. Let me know if you need pointers in debugging either scenario.
@jibanes If you want to test quickly and you are on a Linux host:
When you suspect the Pi is failing to boot, just send a broadcast or multicast packet from a host on the same physical LAN. The IP/port/content does not matter as long as it is a broadcast or a multicast packet. For example:
echo "hello" | socat - udp4-datagram:224.0.0.100:24000
If you see that this reliably nudges the Pi into booting, then just send this packet periodically and you're done.
very useful hlev, thank you, I'll explore this; it seems to be somewhat nondeterministic but from my tcpdumps it's not even making the dhcp request. I will also send the multicast packet from cronjob, I'll post results here, many thanks.
Sure, welcome. I suggest higher recurrence rate than cron though, a simple loop with a 3 second sleep for example.
One reason for this being nondeterministic is that on busy networks there is a good chance that at any point in time an ARP broadcast, or DHCP traffic or other unrelated traffic will be present and those packets nudge the Pi out of the buggy loop mentioned in that other thread and it just works.
On less busy networks, according to my experience, this self-generated traffic reliably fixes the issue and if you have only a few other hosts on the network you can very clearly see in the traffic dump that the Pi makes the initial bootp request, fails to act on the DHCPOFFER but continues as soon as your packet reaches its interface.
Similar to https://github.com/raspberrypi/firmware/issues/764, originally reported in forums before I found my way here. I've followed tutorial, main difference that a fritzbox is running as router in the local network and serving dhcp.
rpi-update
has been run, didn't tryBRANCH=next
yet.tcpdump:
daemon log:
dnsmasq.conf:
The
" "
and appended pxe server IP were added after looking for potential solutions.Is there any way to get this working without need for as SD card?
UPDATE Playing with
dhcp-reply-delay
from 1 to 5 seconds didn't help any.UPDATE I've also tried with an SD card containing but a single
FAT32
partition with justBOOTCODE.BIN
. This does go into TFTP but fails withUnable to mount root fs on unknown-block(2,0)
, but that's a separate topic.