Closed antonio-c-mariani closed 5 years ago
Yes it is related to the bootcode.bin, yes it's still there in the B+, yes I am also having it and yes it is possible to fix it (if someone wants to do it). I'm going to return quite a lot of devices this week so this will definitely be noticed somewhere. Also I am having issues when a kernel panic happens (I have watchdog enabled, not sure if that makes a difference though) it stops at loading start.elf and hangs there forever, if you could confirm this with the newest bootcode it would be helpful in terms of saying that's another issue in the Netboot functionality.
My experiences can be found at the bottom of #859 if you are interested. Someone with an older bootcode doesn't have the start.elf issue there.
We are using the "bootcode.bin" that comes with Raspbian-2018-11-13. It seems to be the same as it is in the master branch of github (from a month ago).
Same here, but we have to seperate between 2 issues here: The TFTP on a different subnet, which I am having with the bootcode.bin and the "stuck at start.elf"-issue after a kernel panic which only I seem to have (it would be awesome if you could check that aswell, provoke a kernel panic and see if it comes up fine again or get's stuck after loading start.elf).
I would appreciate it if we could focus on the first issue here (the ARP request).
Do you think there is any other information needed to identify and fix the problem?
The problem is already identified (at least to me it's clear what's happening here), so all that's needed is someone with access to the sources to fix this.
Recently ghollingworth said (#859):
... the router problem was fixed with the 3B+
Well if the gateway is advertised in the DHCP reply then it should use it... Obviously it'll still do an ARP for the gateway itself but shouldn't ARP for the TFTP server, just go through the gateway.
In our case the DHCP server (dnsmasq) configuration includes:
dhcp-range=10.1.1.2,10.1.1.100,255.255.255.0
dhcp-boot="bootcode.bin","192.168.56.101",192.168.56.101
dhcp-option=3,10.1.1.1
Even so, RPI 3B+ is looking for the tftp server on the local network instead of just "go through the gateway", as shown in the pictures above. Are we doing something wrong? We appreciate any answer about that.
@ghollingworth was talking about the bootrom, the bootcode itself isn't fixed yet. So no you're doing nothing wrong (instead of maybe not having returned the devices yet), this is still a bug and still not fixed (and also nobody said that someone started working on this yet, so probably nobody is looking into this yet)
Maybe the issue is a bit more tricky. I've booted a rpi3b+ with a SD card containing only boot/bootcode.bin file. The image below shows that after loading the bootcode.bin the RPI3 B+ makes an arp request looking for the sub-net router (10.1.1.1) and then correctly starts to load the remains files from remote TFTP server (192.168.56.101).
It seems the bootcode.bin is working fine. But then, who is messing things up?
I'd really appreciate hearing some comments from the Raspberry staff. We depend on this to decide the next step.
I've been trying to reproduce, but I'm having trouble setting up a suitable network... Now done for Christmas, so won't be able to get back to it until the new year
Great to hear that you are working on it for everyone who's still suffering from that. The network setup shouldn't be too complex, it's even possible with 2 raspberries (plus one to test the bootcode). Anyways, happy holidays and hear from you next year!
I set up a testing environment using a laptop configured as a DHCP server (dnsmasq - 10.1.1.1/24) and a virtual machine (VirtualBox) as a TFTP server (tftp-hpa - 192.168.56.101).
I wish you all the best. Thanks.
So after learning all about iptables, nf_nat_tftp and nf_conntrack_tftp I've finally got it reproduced on my desk!
Ten minutes later I've got a fix! Although it requires re-sending the DHCP request / reply which may make the process a little slower...
Can you check this and I'll push the change
Just checked it on 2 RPI 3B+ and it works, unfortunately the start.elf issue is still there (Watchdog resets (not sure if this is the trigger for that, I did a reboot and it did that there aswell), bootcode.bin is downloaded, bootsig.bin is NAKed, start.elf is NAKed, pi gets stuck and doesn't request anything else).
OK, I've been able to reproduce the problem, am trying to understand it, can you create a separate issue for it? Then we can make sure people see this one just related to the booting issue.
I'll close this once I've submitted a patch and it's been pulled...
I've tested the new bootcode.bin and it is seems ok. Thanks.
By the way: UART was still enabled, causing Kodi to completely not work (probably because the GPU Debug output is slowing everything down), that should be disabled before this gets pushed
Potential fix now in latest rpi-update firmware
Our campus network has many sub-nets. It has two centralizes servers, one for DHCP and another one for PXE+TFTP+NFS. We have been using this architecture for many years to remotely load Linux clients. Now we intend to use the same infrastructure to load the kernel and the Raspbian for RPI3 B+. We successfully configured the servers and were able to load the Kernel and Raspian, but only because our Cisco router is configured to use ARP proxy.
The image below shows that the RPI3 B+ makes an arp request looking for the sub-net router (10.1.1.1) and then correctly starts to load the "bootcode.bin" file from remote TFTP server (192.168.56.101).
After the "bootcode.bin" file is loaded another arp request is made but this time looking for the TFTP server (192.168.56.101) as if it were on the same network where the client (RPI3 B+) is.
Thanks to the arp proxy at Cisco router the RPI3 B+ can load the remaining files and boot normally.
The issues #983 and #670 talks about the same problem related to RPI3 B. This problem remains in the B+ version? Is this problem related to "bootcode.bin" file? Is it possible to fix it?