vmware / photon

Minimal Linux container host
https://vmware.github.io/photon
Other
3.05k stars 696 forks source link

tdnf update with PH 5.0 aarch64 results in degraded networking #1485

Open marco-personal opened 1 year ago

marco-personal commented 1 year ago

Describe the bug

Raspberry Pi 4B, 8GB. Wrote photon-rpi-5.0-dde71ec57.aarch64.raw to a 32GB SD and booted without issue, resulting in: VMware Photon OS 5.0 PHOTON_BUILD_NUMBER=dde71ec57

Aside from changing the password, nothing else is changed. tdnf update finds updates, and after rebooting, the system can no longer get an IP from DHCP. I've also tried a static IP, but that is unable to reach anything either.

Working back through, simply doing: tdnf update --exclude linux,systemd allows the system to work as expected (after a reboot). However, later updating and allowing either one (--exclude linux or --exclude systemd), the system does not work.

Just before rebooting before the last failure, I installed tcpdump. -netli eth0 showed that bootp/dhcp packets were leaving, and the system was able to see incoming CDP, but DHCP packets never progress further. I suspected it was a switchport configuration issue at first (trunk, with the native vlan set and allowed) and tried changing it to an access port on the vlan, but that too didn't work. Furthermore, I have x86_64 systems also running Photon 5.0 are running on the same switch, updated without issue.

Reproduction steps

  1. install via image
  2. tdnf update --exclude linux,systemd
  3. reboot -> ok
  4. tdnf update --exclude linux (allows systemd and systemd-{udev,rpm-macros,pam,libs} to install)
  5. reboot -> does not work
  6. reimage
  7. tdnf update --exclude systemd
  8. reboot -> does not work
  9. reimage
  10. tdnf update --exclude linux
  11. reboot -> does not work
  12. reimage
  13. tdnf update --exclude linux,systemd
  14. reboot -> ok
  15. tdnf update
  16. reboot -> does not work

Expected behavior

tdnf update should update packages and the system's networking should continue to work

Additional context

No response

oliverkurth commented 1 year ago

Can you please share the versions of the linux and systemd packages?

@ssahani might be able to help.

Btw, instead of using the exclude option you can lock packages you don't want to upgrade, see https://github.com/vmware/tdnf/wiki/Configuration-Options#package-locks

marco-personal commented 1 year ago

derp, sorry!

fresh image:

after tdnf update:

ssahani commented 1 year ago

ust before rebooting before the last failure, I installed tcpdump. -netli eth0 showed that bootp/dhcp packets were leaving, and the system was able to see incoming CDP, but DHCP packets never progress further.

you mean DHCP DISCOVERER and no offer from server

sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-networkd

can you collect the logs from terminal or

sudo systemctl edit systemd-networkd.service
# /etc/systemd/system/systemd-networkd.service.d/override.conf
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
systemctl daemon-reload
systemctl restart systemd-networkd

collect logs from journal and attach

marco-personal commented 1 year ago

Correct, no offers. When tcpdump'ing on the DHCP server, the DHCP lease is successfully set on a fresh image, but the offer isn't even seen after tdnf update applies

reimaged.log patched.log

ssahani commented 1 year ago

If offerer does not come then we need to see why the DHCP server is dropping the request

Jun 13 12:35:51 photon-machine systemd-networkd[387]: eth0: DHCPv4 client: DISCOVER
Jun 13 12:35:56 photon-machine systemd-networkd[387]: eth0: DHCPv4 client: DISCOVER
Jun 13 12:36:01 photon-machine systemd-networkd[387]: eth0: DHCPv4 client: DISCOVER
Jun 13 12:36:08 photon-machine systemd-networkd[387]: eth0: DHCPv4 client: DISCOVER
Jun 13 12:36:23 photon-machine systemd-networkd[387]: eth0: DHCPv4 client: DISCOVER
ssahani commented 1 year ago
Jun 13 12:28:14 photon-machine systemd-networkd[526]: Could not set hostname: Access denied
Jun 13 12:28:14 photon-machine systemd-networkd[526]: eth0: Received new foreign route (configur
marco-personal commented 1 year ago

Nothing ever makes it to the DHCP server for it to respond after patching.

Interestingly on the switchport, I see input errors (this is ~20 minutes after clearing counters)


     Received 0 broadcasts (0 multicasts)
     0 runts, 0 giants, 0 throttles
     7 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     989 packets output, 76928 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out```
ssahani commented 1 year ago

what is an input error ?

marco-personal commented 1 year ago

Frames the switch can't deal with. too many, malformed, etc. If I move the connection to another switch port, the errors follow. There are no errors on the switchport with a fresh image applied to the SD card

Are changes/build logs available somewhere for the linux and systemd packages?

ssahani commented 1 year ago

what is the difference between the DHCP DISCOVER in both cases. if we see from wireshark we can figure.

marco-personal commented 1 year ago

This update got the since updated linux-6.1.28-2 package. It did NOT fix the issue.


Fresh Image:

0000   ff ff ff ff ff ff dc a6 32 0b 65 52 08 00 45 c0  ........2.eR..E.
0010   01 4a 00 00 00 00 40 11 78 e4 00 00 00 00 ff ff  .J....@.x.......
0020   ff ff 00 44 00 43 01 36 18 e4 01 01 06 00 b3 5f  ...D.C.6......._
0030   c4 85 00 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
0040   00 00 00 00 00 00 dc a6 32 0b 65 52 00 00 00 00  ........2.eR....
0050   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0060   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0070   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0080   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0090   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00a0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00b0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00c0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00d0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00e0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00f0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0100   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0110   00 00 00 00 00 00 63 82 53 63 35 01 01 3d 13 ff  ......c.Sc5..=..
0120   f8 ce 1b a1 00 02 00 00 ab 11 89 90 5c d1 65 19  ............\.e.
0130   20 13 37 09 01 03 06 0c 0f 21 2a 78 79 39 02 05   .7......!*xy9..
0140   c0 32 04 0a 10 0a 04 0c 0e 70 68 6f 74 6f 6e 2d  .2.......photon-
0150   6d 61 63 68 69 6e 65 ff                          machine.

Patched:

0000   ff ff ff ff ff ff dc a6 32 0b 65 52 08 00 45 c0  ........2.eR..E.
0010   01 44 00 00 00 00 40 11 78 ea 00 00 00 00 ff ff  .D....@.x.......
0020   ff ff 00 44 00 43 01 30 15 28 01 01 06 00 12 f4  ...D.C.0.(......
0030   80 f7 00 09 00 00 00 00 00 00 00 00 00 00 00 00  ................
0040   00 00 00 00 00 00 dc a6 32 0b 65 52 00 00 00 00  ........2.eR....
0050   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0060   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0070   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0080   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0090   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00a0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00b0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00c0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00d0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00e0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00f0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0100   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0110   00 00 00 00 00 00 63 82 53 63 35 01 01 3d 13 ff  ......c.Sc5..=..
0120   f8 ce 1b a1 00 02 00 00 ab 11 89 90 5c d1 65 19  ............\.e.
0130   20 13 37 09 01 03 06 0c 0f 21 2a 78 79 39 02 05   .7......!*xy9..
0140   c0 0c 0e 70 68 6f 74 6f 6e 2d 6d 61 63 68 69 6e  ...photon-machin
0150   65 ff                                            e.```
marco-personal commented 1 year ago

still an issue with linux-6.1.28-2.ph5.aarch64

oliverkurth commented 1 year ago

I have the same issue on a RPi4. I had an image with linux 6.1.10-11 which was working. I upgraded the kernel to 6.1.32-1 and network failed, no DHCP lease. systemd version is 253-5.

I also have the weird issue that I need to delete the file /boot/efi/uboot.env before every reboot, or it gets into a boot loop (not kernel related, it does not even get that far).

The same kernel works on an RPi3.

marco-personal commented 1 year ago

Still not working with linux-6.1.37-1

marco-personal commented 1 year ago

Still not working with linux-6.1.41-1 and systemd-253-5

marco-personal commented 1 year ago

Still not working with linux-6.1.45-2 and systemd-253-7

marco-personal commented 1 year ago

Still not working with linux-6.1.53-4 and systemd-253-9.

I've switched to a 1GB unit where it also does not work.

maru0123-2004 commented 3 months ago

I started using photon 5.0 last month, but this issue was happened. So, I search issues and found #1204 . As mentioned in issue, I tried this:

echo 'add_drivers+="bcm_phy_lib broadcom"' >> /etc/dracut.conf.d/broadcom.conf
mkinitrd -q /boot/initrd.img-$(uname -r) $(uname -r)
reboot

It would be rebooted successfully, and network failure was fixed.

Thanks.

marco-personal commented 3 months ago

@maru0123-2004 Wow, that's great. Thanks for searching. Not sure how I missed those! I've since divested myself of all my RPIs so I can no longer test though. Regardless, thanks again!