Open ali1234 opened 6 years ago
Also sometimes it does this:
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Can you capture a tcpdump from the server? Specifically try to make sure you capture all packets using
sudo tcpdump -i eth0 ether host b8:27:eb:86:96:f5 -w boot.pcap
Just swap eth0 for your ethernet interface so we can see all the information... This needs to be run on the server that serves the DHCP response or you'll need a managed switch so you can access all traffic between the network and the Pi
Gordon
You are missing the DHCP replies from that, are you capturing it from the DHCP server?
Yes.
This is the server configuration: https://github.com/ali1234/rpi-ramdisk/blob/master/dnsmasq.conf.in
Also note that the Rpi booted after 12 DHCP requests. Is even the 12th response missing?
None of the responses are in there... If you download and run Wireshark on a PC you can load up the pcap file and have a look for yourself... Obviously there must have been a response otherwise the TFTP wouldn't have started in the first place.
Maybe it's a problem with the tcpdump... Maybe worth trying
sudo tcpdump -i eth0 -w dump.pcap
It'll dump a lot of data, but we only need to see the bit up to the start of TFTP booting
This produced a 1.9GB file so I filtered it in wireshark first.
There are two things I can see in that log which will cause problems booting Raspberry Pi 3 without the bootcode.bin:
1) The response has "Raspberry Pi Boot" in it which won't work, you need to add a few spaces at the end of the string, so make sure you set the string to "Raspberry Pi Boot " 2) The response occurs in less than 2 seconds from the request so will trigger the timeout lockup bug, the dhcp reply delay needs to be increased to 2 seconds
But both of these should work with bootcode.bin
The main issue is that your code doesn't offer an IP address to the Raspberry Pi... What I'm really surprised about is why it does boot occasionally, I assume you do have a second DHCP server on your network to reply to normal DHCP requests ?
I just tried putting a 4 port ethernet hub in between the RPi and the switch so that I can dump all the packets. However, this prevents the problem from happening completely. It always boots after the first DHCP request in this set up.
Yes, I have a normal DHCP server in addition to the PXE server.
So it sounds like the problem is between the Pi and the switch... In that the switch is either masking out packets for some reason or the Pi communication with the switch is not correct.
Let me try adding some extra debug and we'll see if we can see packets being received...
Gordon
Here is a diagram of my network topology for clarity.
I can tcpdump from my workstation and from the router itself. However, the router gives somewhat odd results: if I tcpdump eth2 I only see DHCP packets originating from the Pi. If I tcpdump eth3 I only see DHCP packets from the boot server. And if I tcpdump br0 I see both, except that the very first request packet is not captured - although the response is.
I added the hub in between the router and the RPi. It is a true hub and only supports 10BASE-T. I didn't have anything else connected to it so there could not have been any extra traffic and of course it does not support ARP. The only things it could affect are the network speed and that it could be keeping the link up on the router port even when the Pi's ethernet link goes down.
Also, there's a lot of other stuff on the network, as shown. Presence of Windows machines still seems to affect things, but it isn't consistent enough to be sure.
How about something like:
tcpdump -i br0 -vvv -s 1500 '(port 67 or port 68)'
13:07:58.307536 IP (tos 0xc0, ttl 64, id 12729, offset 0, flags [none], proto UDP (17), length 344)
al-desktop.lan.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 316, xid 0x26f30339, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: al-desktop.lan
Vendor-Class Option 60, length 9: "PXEClient"
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Option Option 43, length 34: 6.1.3.10.4.0.80.88.69.9.22.0.0.19.82.97.115.112.98.101.114.114.121.32.80.105.32.66.111.111.116.32.32.255
END Option 255, length 0
13:08:01.883867 IP (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 350)
0.0.0.0.bootpc > 255.255.255.255.bootps: [no cksum] BOOTP/DHCP, Request from b8:27:eb:86:96:f5 (oui Unknown), length 322, xid 0x26f30339, Flags [none] (0x0000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
Parameter-Request Option 55, length 14:
Vendor-Option, Vendor-Class, BF, Option 128
Option 129, Option 130, Option 131, Option 132
Option 133, Option 134, Option 135, TFTP
Subnet-Mask, Default-Gateway
ARCH Option 93, length 2: 0
NDI Option 94, length 3: 1.2.1
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
END Option 255, length 0
13:08:03.887225 IP (tos 0xc0, ttl 64, id 13077, offset 0, flags [none], proto UDP (17), length 344)
al-desktop.lan.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 316, xid 0x26f30339, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: al-desktop.lan
Vendor-Class Option 60, length 9: "PXEClient"
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Option Option 43, length 34: 6.1.3.10.4.0.80.88.69.9.22.0.0.19.82.97.115.112.98.101.114.114.121.32.80.105.32.66.111.111.116.32.32.255
END Option 255, length 0
13:08:06.886415 IP (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 350)
0.0.0.0.bootpc > 255.255.255.255.bootps: [no cksum] BOOTP/DHCP, Request from b8:27:eb:86:96:f5 (oui Unknown), length 322, xid 0x26f30339, Flags [none] (0x0000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
Parameter-Request Option 55, length 14:
Vendor-Option, Vendor-Class, BF, Option 128
Option 129, Option 130, Option 131, Option 132
Option 133, Option 134, Option 135, TFTP
Subnet-Mask, Default-Gateway
ARCH Option 93, length 2: 0
NDI Option 94, length 3: 1.2.1
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
END Option 255, length 0
13:08:08.890332 IP (tos 0xc0, ttl 64, id 13510, offset 0, flags [none], proto UDP (17), length 344)
al-desktop.lan.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 316, xid 0x26f30339, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: al-desktop.lan
Vendor-Class Option 60, length 9: "PXEClient"
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Option Option 43, length 34: 6.1.3.10.4.0.80.88.69.9.22.0.0.19.82.97.115.112.98.101.114.114.121.32.80.105.32.66.111.111.116.32.32.255
END Option 255, length 0
13:08:11.888958 IP (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 350)
0.0.0.0.bootpc > 255.255.255.255.bootps: [no cksum] BOOTP/DHCP, Request from b8:27:eb:86:96:f5 (oui Unknown), length 322, xid 0x26f30339, Flags [none] (0x0000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
Parameter-Request Option 55, length 14:
Vendor-Option, Vendor-Class, BF, Option 128
Option 129, Option 130, Option 131, Option 132
Option 133, Option 134, Option 135, TFTP
Subnet-Mask, Default-Gateway
ARCH Option 93, length 2: 0
NDI Option 94, length 3: 1.2.1
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
END Option 255, length 0
13:08:13.892696 IP (tos 0xc0, ttl 64, id 14388, offset 0, flags [none], proto UDP (17), length 344)
al-desktop.lan.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 316, xid 0x26f30339, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: al-desktop.lan
Vendor-Class Option 60, length 9: "PXEClient"
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Option Option 43, length 34: 6.1.3.10.4.0.80.88.69.9.22.0.0.19.82.97.115.112.98.101.114.114.121.32.80.105.32.66.111.111.116.32.32.255
END Option 255, length 0
13:08:16.891562 IP (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 350)
0.0.0.0.bootpc > 255.255.255.255.bootps: [no cksum] BOOTP/DHCP, Request from b8:27:eb:86:96:f5 (oui Unknown), length 322, xid 0x26f30339, Flags [none] (0x0000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
Parameter-Request Option 55, length 14:
Vendor-Option, Vendor-Class, BF, Option 128
Option 129, Option 130, Option 131, Option 132
Option 133, Option 134, Option 135, TFTP
Subnet-Mask, Default-Gateway
ARCH Option 93, length 2: 0
NDI Option 94, length 3: 1.2.1
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
END Option 255, length 0
13:08:19.664239 IP (tos 0xc0, ttl 64, id 15224, offset 0, flags [none], proto UDP (17), length 344)
al-desktop.lan.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 316, xid 0x26f30339, Flags [Broadcast] (0x8000)
Client-Ethernet-Address b8:27:eb:86:96:f5 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: al-desktop.lan
Vendor-Class Option 60, length 9: "PXEClient"
GUID Option 97, length 17: 0.245.150.134.141.245.150.134.141.245.150.134.141.245.150.134.141
Vendor-Option Option 43, length 34: 6.1.3.10.4.0.80.88.69.9.22.0.0.19.82.97.115.112.98.101.114.114.121.32.80.105.32.66.111.111.116.32.32.255
END Option 255, length 0
Same, 5 replies and only 4 requests, except that -vvv makes it much more verbose.
OK,
But from that it still doesn't seem to get offered an IP address... Are you sure your main dhcp server is set up to respond to the requests from the Pi on eth2?
I assume it works fine with an SD card in there? If so can you capture the DHCP request / reply using the same tcpdump?
Gordon
Oh, I didn't realise you wanted to see that. Here:
OK, it's difficult to understand what is / is not making it to the Pi... Will add some debug
Something that just occured to me. After the initial 5 DHCP attempts it goes into this loop of:
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
What I noticed is that it is extremely common to see four "Trying" lines for every "Initialise" line, and usually three "Trying" lines are printed at once. Is it possible that after the first attempt, all four cores are running at once, trying to initialize the ethernet?
OK,
Can you run this version of bootcode.bin
I've added a bunch of further debug for the DHCP stuff... Should help understand whats wrong
Gordon
I am having trouble reproducing the problem with that bootcode.bin. I will keep trying.
However I did notice something interesting with the release bootcode.bin (b1a7f4aea). If I stop dnsmasq then it behaves differently. It consistently retries, without ever getting stuck in the "Trying to boot" loop. From this I conclude that some communication from dnsmasq must be getting through normally.
Sending DHCP request
Waiting for dhcp_reply
Sending DHCP request
Waiting for dhcp_reply
Sending DHCP request
Waiting for dhcp_reply
Sending DHCP request
Waiting for dhcp_reply
Sending DHCP request
Waiting for dhcp_reply
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Wait for Link up
Link up
Sending DHCP request
Waiting for dhcp_reply
Sending DHCP request
Waiting for dhcp_reply
Sending DHCP request
I am still unable to reproduce with the debug firmware. It consistently boots after the second DHCP request.
Raspberry Pi Bootcode
Found SD card, config.txt = 0, start.elf = 0, recovery.elf = 0, timeout = 0
Trying USB
Hub device found at addr 4, enumerating HUB
Initialise hub
Found 5 ports, multi_tt = 1
Setting interface 0
Enabling PORT POWER on port 1
Enabling PORT POWER on port 2
Enabling PORT POWER on port 3
Enabling PORT POWER on port 4
Enabling PORT POWER on port 5
Waiting for devices to respond to reset
Found device on port 1
Found highspeed device
Device found: type = Ethernet adapter, addr = 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Wait for Link up
Link up
Sending DHCP request
Waiting for dhcp_reply
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
DHCP_OFFER: from 192.168.0.5
ip_addr = 0.0.0.0
Options:
Offer
Unknown extension 54
Unknown extension 60
Unknown extension 97
Vendor specific information = ??????PXE?????Raspberry Pi Boot ?
Sending DHCP request
Waiting for dhcp_reply
Received packet: SRC=20:4e:7f:16:57:a5 DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=20:4e:7f:16:57:a5 DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
DHCP_OFFER: from 192.168.0.5
ip_addr = 0.0.0.0
Options:
Offer
Unknown extension 54
Unknown extension 60
Unknown extension 97
Vendor specific information = ??????PXE?????Raspberry Pi Boot ?
Received packet: SRC=20:4e:7f:16:57:a5 DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=20:4e:7f:16:57:a5 DST=b8:27:eb:86:96:f5
Received packet: SRC=20:4e:7f:16:57:a5 DST=b8:27:eb:86:96:f5
DHCP_OFFER: from 192.168.0.1
ip_addr = 192.168.0.113
Options:
Offer
Unknown extension 54
Unknown extension 51
Unknown extension 58
Unknown extension 59
Subnet 255.255.255.0
Unknown extension 28
Gateway 192.168.0.1
Unknown extension 211
Unknown extension 210
Unknown extension 209
Done ARP for 192.168.0.5 got 34:97:f6:27:97:4b
Read File: config.txt, 210 (bytes)
Raspberry Pi Bootcode
Read File: config.txt, 210
Read File: start_x.elf, 3904228 (bytes)
Read File: fixup_x.dat, 9694 (bytes)
Something I just noticed looking at this. Sometimes the replies come in a different order:
Raspberry Pi Bootcode
Found SD card, config.txt = 0, start.elf = 0, recovery.elf = 0, timeout = 0
Trying USB
Hub device found at addr 4, enumerating HUB
Initialise hub
Found 5 ports, multi_tt = 1
Setting interface 0
Enabling PORT POWER on port 1
Enabling PORT POWER on port 2
Enabling PORT POWER on port 3
Enabling PORT POWER on port 4
Enabling PORT POWER on port 5
Waiting for devices to respond to reset
Found device on port 1
Found highspeed device
Device found: type = Ethernet adapter, addr = 5
Trying booting from Ethernet device addr 5
Initialise ethernet with MAC b8:27:eb:86:96:f5
Wait for Link up
Link up
Sending DHCP request
Waiting for dhcp_reply
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
DHCP_OFFER: from 192.168.0.5
ip_addr = 0.0.0.0
Options:
Offer
Unknown extension 54
Unknown extension 60
Unknown extension 97
Vendor specific information = ??????PXE?????Raspberry Pi Boot ?
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
Sending DHCP request
Waiting for dhcp_reply
Received packet: SRC=20:4e:7f:16:57:a5 DST=b8:27:eb:86:96:f5
DHCP_OFFER: from 192.168.0.1
ip_addr = 192.168.0.113
Options:
Offer
Unknown extension 54
Unknown extension 51
Unknown extension 58
Unknown extension 59
Subnet 255.255.255.0
Unknown extension 28
Gateway 192.168.0.1
Unknown extension 211
Unknown extension 210
Unknown extension 209
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
Received packet: SRC=34:97:f6:27:97:4b DST=ff:ff:ff:ff:ff:ff
DHCP_OFFER: from 192.168.0.5
Options:
Offer
Unknown extension 54
Unknown extension 60
Unknown extension 97
Vendor specific information = ??????PXE?????Raspberry Pi Boot ?
Done ARP for 192.168.0.5 got 34:97:f6:27:97:4b
Read File: config.txt, 210 (bytes)
Raspberry Pi Bootcode
Read File: config.txt, 210
Read File: start_x.elf, 3904228 (bytes)
Read File: fixup_x.dat, 9694 (bytes)
What is particularly interesting about this trace: notice that the second reply from 192.168.0.5 has a blank space where it should print the IP address. This looks more and more like a buffer overrun to me. With the debugging firmware it is only damaging the debug code so it still boots. With the regular firmware it's overwriting something important, so it fails to boot when this happens?
No, the reason it only prints it on the first time is because I only print the value of the offered IP address if there isn't an IP address set... The second time an ip address is already set so it doesn't print it out...
Will change the code to always print it, but that's not indicative of any change.
Do you know what the version of bootcode.bin was that you used before? I should be able to trace through the commits to see if anything else has changed...
I'll commit these changes anyway...
Opening a new issue because you closed my last one when someone else reported that it worked.
Placing bootcode.bin on the SD card still doesn't work reliably for me. It normally takes multiple DHCP requests before it will work. The new firmware seems to retry after a long delay if the first 5 requests fails. I'm seeing it send 5 requests, ignore every response, and then after the delay, it works after a further two DHCP requests. But even that doesn't work every time. Sometimes when it tries to retry it gets stuck in a loop printing the following on the UART:
It will continuously print this until I power cycle it.
What is particularly interesting is that the frequency of the repeated messages varies. Sometimes I see about two per second, other times it seems to print them as fast as the UART can handle.