raspberrypi / rpi-eeprom

Installation scripts and binaries for the Raspberry Pi 4 and Raspberry Pi 5 bootloader EEPROMs
https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#raspberry-pi-boot-eeprom
Other
1.26k stars 201 forks source link

Network boot taking nearly 2 minutes if STP is active on managed Ethernet switch #480

Open maxnet opened 1 year ago

maxnet commented 1 year ago

Describe the bug

On managed Ethernet switches that have the Spanning Tree Protocol enabled, every time the network link goes up it tests for network loops for +/- 30 seconds before it lets normal traffic through.

In an ideal world, everyone buying a managed Ethernet switch would know how to properly configure it, and only enable full STP on uplink switch ports that connect to other Ethernet switches, and disable it (or set it to portfast/rapid STP flavour) on Ethernet ports leading to host devices like computers and Pi. We could provide some education on how to do configuration, but some users may still struggle with the console cable and CLI commands necessary (not all switches offer a convenient option to set this in the management web interface). And therefore it is realistic to expect that some will end up using a switch that has STP enabled, and suffer from this problem.

In that case my expectancy would be that network boot is delayed by 30 seconds, but not much more. But reality is if STP is enabled it currently takes the boot firmware like 1 minutes 50 seconds instead:

20230618_143130

It waits 25 seconds for USB MSD to timeout, goes to network boot, does not succeed because switch port is not forwarding traffic yet, goes to other boot methods again having to wait 25 seconds for USB MSD again, does succeed to network boot the second time.

If I look at the events from the switch:

Switch#
*Mar  1 00:20:27.412:     pm_vp 2/14(1): during state not_present, got event 0(add)
*Mar  1 00:20:27.412: @@@ pm_vp 2/14(1): not_present -> present
*Mar  1 00:20:27.412: *** vp_statechange: single: added: 2/14(1)
*Mar  1 00:20:27.412:     pm_vp 2/14(1): during state present, got event 8(linkup)
*Mar  1 00:20:27.412: @@@ pm_vp 2/14(1): present -> authentication
*Mar  1 00:20:27.421:     pm_vp 2/14(1): idle during state authentication
*Mar  1 00:20:27.421: @@@ pm_vp 2/14(1): authentication -> authentication
*Mar  1 00:20:27.421:     pm_vp 2/14(1): during state authentication, got event 18(authen_disable)
*Mar  1 00:20:27.421: @@@ pm_vp 2/14(1): authentication -> notforwarding
*Mar  1 00:20:27.421: STP SW: Fa2/0/12 new blocking req for 1 vlans
*Mar  1 00:20:27.421: *** vp_linkchange: single: up: 2/14(1)
*Mar  1 00:20:29.417: %LINK-3-UPDOWN: Interface FastEthernet2/0/12, changed state to up
*Mar  1 00:20:30.424: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/12, changed state to up
*Mar  1 00:20:42.428: STP SW: Fa2/0/12 new learning req for 1 vlans
*Mar  1 00:20:51.991:     pm_vp 2/14(1): during state notforwarding, got event 4(remove)
*Mar  1 00:20:51.991: @@@ pm_vp 2/14(1): notforwarding -> present
*Mar  1 00:20:51.991: *** vp_linkchange: single: down: 2/14(1)
*Mar  1 00:20:51.991: @@@ pm_vp 2/14(1): present -> not_present
*Mar  1 00:20:51.991: *** vp_statechange: single: remove: 2/14(1)
*Mar  1 00:20:52.997: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/12, changed state to down
*Mar  1 00:20:54.004: %LINK-3-UPDOWN: Interface FastEthernet2/0/12, changed state to down
*Mar  1 00:20:54.759:     pm_vp 2/14(1): during state not_present, got event 0(add)
*Mar  1 00:20:54.759: @@@ pm_vp 2/14(1): not_present -> present
*Mar  1 00:20:54.759: *** vp_statechange: single: added: 2/14(1)
*Mar  1 00:20:54.759:     pm_vp 2/14(1): during state present, got event 8(linkup)
*Mar  1 00:20:54.767: @@@ pm_vp 2/14(1): present -> authentication
*Mar  1 00:20:54.767:     pm_vp 2/14(1): idle during state authentication
*Mar  1 00:20:54.767: @@@ pm_vp 2/14(1): authentication -> authentication
*Mar  1 00:20:54.767:     pm_vp 2/14(1): during state authentication, got event 18(authen_disable)
*Mar  1 00:20:54.767: @@@ pm_vp 2/14(1): authentication -> notforwarding
*Mar  1 00:20:54.767: STP SW: Fa2/0/12 new blocking req for 1 vlans
*Mar  1 00:20:54.767: *** vp_linkchange: single: up: 2/14(1)
*Mar  1 00:20:56.764: %LINK-3-UPDOWN: Interface FastEthernet2/0/12, changed state to up
*Mar  1 00:20:57.771: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2/0/12, changed state to up
*Mar  1 00:21:09.775: STP SW: Fa2/0/12 new learning req for 1 vlans
*Mar  1 00:21:24.782: STP SW: Fa2/0/12 new forwarding req for 1 vlans
*Mar  1 00:21:24.782:     pm_vp 2/14(1): during state notforwarding, got event 14(forward_notnotify)
*Mar  1 00:21:24.782: @@@ pm_vp 2/14(1): notforwarding -> forwarding
*Mar  1 00:21:24.782: *** vp_list_fwdchange: forward: 2/14(1)

Would it be possible to change the boot firmware so that it only set the network link carrier to up at the very beginning, and no longer resets the network link when it goes from USB MSD to network boot mode? So that the 25 seconds spent waiting for USB MSD to timeout, would count towards the forwarding delay.

And it would be nice if it would try DHCP a little longer. The PXE standard demands DHCP discover is retried at 4, 8, 16 and 32 seconds for a reason.

Steps to reproduce the behaviour

Device (s)

Raspberry Pi 4 Mod. B, Raspberry Pi 400, Raspberry Pi CM4, Raspberry Pi CM4 Lite

pelwell commented 1 year ago

One could argue that the longer delay is harder to ignore, and therefore more likely to be detected and solved by turning off STP on that port.

maxnet commented 1 year ago

If detection is the goal: then I suggest the boot firmware prints a big fat warning message if it detects STP is enabled on the network port to alert the user. While the switch does not forward traffic, the Pi do can see the packets the switch sends to test for network loops, so can test for that.

In the Piserver code I am already displaying a warning on the server if it detects STP is active in the network. https://github.com/raspberrypi/piserver/blob/master/src/stpanalyzer.cpp Could do something similar in cmprovision (currently no tests there). However that detection is better done on the client. As server-side I can only test if STP is active on the server's port, while on most managed switches it is configurable on a per-port basis, and it is the client's port that counts. And the testing also does not work if the server OS is running under virtualization.

timg236 commented 1 year ago

The DHCP timeout is quite long already and can be overridden in the EEPROM config. Not resetting the PHY is likely to cause as many problems as it solves so I think the answer is that the network boot behaviour is not likely to change in the near future.

maxnet commented 1 year ago

The DHCP timeout is quite long already and can be overridden in the EEPROM config.

How long does it actually keep sending new DHCP discover packets, as opposed to just waiting for replies?

As you can see in my switch log. The network link carrier goes up at 00:20:57.771, and it allows packets to be forwarded at 00:21:24.782 So things do are functional in 27 seconds.

The fine documentation suggests the default timeout is 45 seconds: https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#DHCP_TIMEOUT 45 seconds is more than 27. So how come it does not work the first round? And only boots after it goes to the second round (in which it does not seem to reset the PHY anymore).

Not resetting the PHY is likely to cause as many problems as it solves

And the resetting has to wait until it gets to network boot order? It cannot be done first thing when the firmware boots?

maxnet commented 1 year ago

How long does it actually keep sending new DHCP discover packets, as opposed to just waiting for replies?

To answer my own question. If for testing purposes I kill my DHCP server, and sniff how many packets your boot firmware sends, it seems to be only the grand total of 3 packets in a round.

$ sudo tcpdump -i eth0 port bootpc -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:57:06.143461 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from b8:27:eb:57:76:be, length 322
17:57:14.136637 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from b8:27:eb:57:76:be, length 322
17:57:30.136505 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from b8:27:eb:57:76:be, length 322

Last DHCP discover packet is 24 seconds after the first.

Your default timeout of 45 seconds may be long, but if it is not sending out any DHCP discovers anymore after 24 seconds, and everything in the first +/- 30 seconds is dropped if the user has a STP enabled switch, that is not going to work...

==

The fine documentation mentions that it is supposed to be retrying at 4 second intervals.

DHCP_REQ_TIMEOUT The timeout in milliseconds before retrying DHCP DISCOVER or DHCP REQ.

Minimum: 500 Default: 4000

While it seems more like an exponential back-off delay starting at 8 seconds in practice.