raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.18k stars 1.68k forks source link

PXE RPi3 B+ - TFTP in different subnet (similar to #983 and #670) #1078

Closed antonio-c-mariani closed 5 years ago

antonio-c-mariani commented 5 years ago

Our campus network has many sub-nets. It has two centralizes servers, one for DHCP and another one for PXE+TFTP+NFS. We have been using this architecture for many years to remotely load Linux clients. Now we intend to use the same infrastructure to load the kernel and the Raspbian for RPI3 B+. We successfully configured the servers and were able to load the Kernel and Raspian, but only because our Cisco router is configured to use ARP proxy.

The image below shows that the RPI3 B+ makes an arp request looking for the sub-net router (10.1.1.1) and then correctly starts to load the "bootcode.bin" file from remote TFTP server (192.168.56.101). pxe1

After the "bootcode.bin" file is loaded another arp request is made but this time looking for the TFTP server (192.168.56.101) as if it were on the same network where the client (RPI3 B+) is. pxe2

Thanks to the arp proxy at Cisco router the RPI3 B+ can load the remaining files and boot normally.

The issues #983 and #670 talks about the same problem related to RPI3 B. This problem remains in the B+ version? Is this problem related to "bootcode.bin" file? Is it possible to fix it?

Flole998 commented 5 years ago

Yes it is related to the bootcode.bin, yes it's still there in the B+, yes I am also having it and yes it is possible to fix it (if someone wants to do it). I'm going to return quite a lot of devices this week so this will definitely be noticed somewhere. Also I am having issues when a kernel panic happens (I have watchdog enabled, not sure if that makes a difference though) it stops at loading start.elf and hangs there forever, if you could confirm this with the newest bootcode it would be helpful in terms of saying that's another issue in the Netboot functionality.

My experiences can be found at the bottom of #859 if you are interested. Someone with an older bootcode doesn't have the start.elf issue there.

antonio-c-mariani commented 5 years ago

We are using the "bootcode.bin" that comes with Raspbian-2018-11-13. It seems to be the same as it is in the master branch of github (from a month ago).

Flole998 commented 5 years ago

Same here, but we have to seperate between 2 issues here: The TFTP on a different subnet, which I am having with the bootcode.bin and the "stuck at start.elf"-issue after a kernel panic which only I seem to have (it would be awesome if you could check that aswell, provoke a kernel panic and see if it comes up fine again or get's stuck after loading start.elf).

antonio-c-mariani commented 5 years ago

I would appreciate it if we could focus on the first issue here (the ARP request).

Do you think there is any other information needed to identify and fix the problem?

Flole998 commented 5 years ago

The problem is already identified (at least to me it's clear what's happening here), so all that's needed is someone with access to the sources to fix this.

antonio-c-mariani commented 5 years ago

Recently ghollingworth said (#859):

... the router problem was fixed with the 3B+

Well if the gateway is advertised in the DHCP reply then it should use it... Obviously it'll still do an ARP for the gateway itself but shouldn't ARP for the TFTP server, just go through the gateway.

In our case the DHCP server (dnsmasq) configuration includes:

dhcp-range=10.1.1.2,10.1.1.100,255.255.255.0
dhcp-boot="bootcode.bin","192.168.56.101",192.168.56.101
dhcp-option=3,10.1.1.1

Even so, RPI 3B+ is looking for the tftp server on the local network instead of just "go through the gateway", as shown in the pictures above. Are we doing something wrong? We appreciate any answer about that.

Flole998 commented 5 years ago

@ghollingworth was talking about the bootrom, the bootcode itself isn't fixed yet. So no you're doing nothing wrong (instead of maybe not having returned the devices yet), this is still a bug and still not fixed (and also nobody said that someone started working on this yet, so probably nobody is looking into this yet)

antonio-c-mariani commented 5 years ago

Maybe the issue is a bit more tricky. I've booted a rpi3b+ with a SD card containing only boot/bootcode.bin file. The image below shows that after loading the bootcode.bin the RPI3 B+ makes an arp request looking for the sub-net router (10.1.1.1) and then correctly starts to load the remains files from remote TFTP server (192.168.56.101).

rpi3

It seems the bootcode.bin is working fine. But then, who is messing things up?

I'd really appreciate hearing some comments from the Raspberry staff. We depend on this to decide the next step.

ghollingworth commented 5 years ago

I've been trying to reproduce, but I'm having trouble setting up a suitable network... Now done for Christmas, so won't be able to get back to it until the new year

Flole998 commented 5 years ago

Great to hear that you are working on it for everyone who's still suffering from that. The network setup shouldn't be too complex, it's even possible with 2 raspberries (plus one to test the bootcode). Anyways, happy holidays and hear from you next year!

antonio-c-mariani commented 5 years ago

I set up a testing environment using a laptop configured as a DHCP server (dnsmasq - 10.1.1.1/24) and a virtual machine (VirtualBox) as a TFTP server (tftp-hpa - 192.168.56.101).

I wish you all the best. Thanks.

ghollingworth commented 5 years ago

So after learning all about iptables, nf_nat_tftp and nf_conntrack_tftp I've finally got it reproduced on my desk!

Ten minutes later I've got a fix! Although it requires re-sending the DHCP request / reply which may make the process a little slower...

Can you check this and I'll push the change

bootcode.zip

Flole998 commented 5 years ago

Just checked it on 2 RPI 3B+ and it works, unfortunately the start.elf issue is still there (Watchdog resets (not sure if this is the trigger for that, I did a reboot and it did that there aswell), bootcode.bin is downloaded, bootsig.bin is NAKed, start.elf is NAKed, pi gets stuck and doesn't request anything else).

ghollingworth commented 5 years ago

OK, I've been able to reproduce the problem, am trying to understand it, can you create a separate issue for it? Then we can make sure people see this one just related to the booting issue.

I'll close this once I've submitted a patch and it's been pulled...

antonio-c-mariani commented 5 years ago

I've tested the new bootcode.bin and it is seems ok. Thanks.

Flole998 commented 5 years ago

By the way: UART was still enabled, causing Kodi to completely not work (probably because the GPU Debug output is slowing everything down), that should be disabled before this gets pushed

popcornmix commented 5 years ago

Potential fix now in latest rpi-update firmware