kernelkit / infix

Linux :yellow_heart: NETCONF = Infix
https://kernelkit.org
GNU General Public License v2.0
49 stars 12 forks source link

Port LED traffic indication fails when port is up when booting #670

Open jovatn opened 2 weeks ago

jovatn commented 2 weeks ago

Current Behavior

This bug has been introduced somewhere between infix 24.06.0 and 24.09.0.

Port LED 'GREEN BLINK' is used for traffic indication, while steady GREEN just indicated status up. Upon boot, the traffic indication currently fails if the port is up from start. The port just shows steady GREEN.

If such a port is unplugged and plugged in again, traffic indication works.

Expected Behavior

Traffic indication (GREEN BLINK) should work for all ports that are up and exchanging traffic, also for ports being up at boot time.

Steps To Reproduce

  1. Install Infix 24.09.0 on hardware
  2. Do a factory reset (with default configuration all ports are standalone interfaces with IPv6 auto-config addresses)
  3. Connect port 1 and 2, port 3 and 4, etc.
  4. Reboot
  5. When unit comes up, all ports should will get linkup (GREEN LED)
  6. Initiate traffic, e.g., by running ping ff02::1%e1. This should lead to GREEN BLINK on port 1 and 2, but it does not.
  7. Unplug e1, and plug in again. Restart ping if necessary. port 1 and 2 should now indicate traffic (GREEN BLINK)

Additional information

To compare, do the same with Infix 24.06.0. There traffic indication works upon boot.

wkz commented 2 days ago

Here's a script that shows all the LED config registers, and exits with code 1 if any LED is in the wrong state:

#!/bin/sh

x=0

for sw in 2 4 6; do
        for p in $(seq 8); do
                cfg=$(mdio f212a2* mvls $sw $p:0x16)
                printf ':%02x.%u: ' $sw $p
                echo $cfg

                [ "$cfg" = "0x0023" ] || x=1
        done
done

exit $x

Here is a ply script that can be used to trace the order of updates to the LED configuration:

k:led_blink_set_nosleep { ev[arg0, "led_sns", pid, comm] = time + 1; }
k:led_set_brightness { ev[arg0, "led_set", pid, comm] = time + 1; }
k:led_set_brightness_nopm { ev[arg0, "led_spm", pid, comm] = time + 1; }
k:mv88e6393x_led_brightness_set { ev[arg0 + 16, "mv6_set", pid, comm] = time + 1; }
k:mv88e6393x_led_hw_control_set { ev[arg0 + 16, "mv6_hcs", pid, comm] = time + 1; }
k:mv88e6393x_led_hw_control_get { ev[arg0 + 16, "mv6_hcg", pid, comm] = time + 1; }

Refined log data showing the difference between a port that ends up in the correct state, and one in the wrong state:

e1 (:02.2) OK
{ ffff000100e6c090, mv6_hcg,  2272, iitod           }: 8132086721
{ ffff000100e6c090, mv6_set,   217, kworker/3:1     }: 8133239361
{ ffff000100e6c090, led_set,  2272, iitod           }: 8134286001
{ ffff000100e6c090, mv6_hcs,  2272, iitod           }: 8134326561
{ ffff000100e6c090, mv6_hcs,  3938, 50-init.ip      }: 14146246201
{ ffff000100e6c090, mv6_hcs,   208, kworker/u9:4    }: 17609473441
{ ffff000100e6c090, mv6_hcg,  4979, iitod           }: 18884180641
{ ffff000100e6c090, mv6_set,   899, kworker/2:2     }: 18885689201
{ ffff000100e6c090, led_set,  4979, iitod           }: 18889565721
{ ffff000100e6c090, mv6_hcs,  4979, iitod           }: 18889627161

e2 (:02.1) ERROR
{ ffff00010417e490, mv6_hcg,  2272, iitod           }: 8096103001
{ ffff00010417e490, led_set,  2272, iitod           }: 8098372201
{ ffff00010417e490, mv6_hcs,  2272, iitod           }: 8098414721
{ ffff00010417e490, mv6_hcs,  4224, 50-init.ip      }: 14879725121
{ ffff00010417e490, mv6_hcs,   208, kworker/u9:4    }: 17632630721
{ ffff00010417e490, mv6_hcg,  4979, iitod           }: 18820225481
{ ffff00010417e490, mv6_set,   899, kworker/2:2     }: 18821875161
{ ffff00010417e490, led_set,  4979, iitod           }: 18825496681
{ ffff00010417e490, mv6_hcs,  4979, iitod           }: 18825914401
{ ffff00010417e490, mv6_set,   217, kworker/3:1     }: 18888517121

It seems to indicate that changing the LED trigger, and going from "unoffloadable" to "offloadable" configs may execute out-of-order, as commands coming in via led_set_brightness() are asynchronously executed via the system workqueue, while hardware offload updates are synchronously executed from the calling process.

troglobit commented 1 day ago

Core team decision today: to work around the issue for Styx boards by disabling iitod, or at least port LED control, for now.