siemens / meta-iot2050

SIMATIC IOT2050 Isar/Debian Board Support Package
MIT License
129 stars 76 forks source link

Zombie process into last master and IOT in FS03 #428

Closed rFond closed 1 year ago

rFond commented 1 year ago

Hi,

I have build image with last git so I create a image V01.03.01. When I installing and boot with this image to IOT reference 6ES7 647-0BA00-1YA2 FS05 everything is working. But when i take image on IOT reference 6ES7 647-0BA00-1YA2 FS03 (update firmware : iot2050-firmware-update IOT2050-PG1-FW-Update-PKG-V01.02.01.tar.xz) image run, but 1 core CPU was at 100% as long as no connect RJ45 (if I no connect 2 RJ45 so 2 CPU at 100% as used). I have a a process "kworker" for 1 RJ45.

Strange thing when I connect RJ45 kworker disappears until a reboot.

Thanks.

jan-kiszka commented 1 year ago

Last git (master) != 1.3.0 (you likely meant V01.03.01) - what exactly were you building?

FWIW, I've just checked again with a PG2 FS5 device and an image built from today's master: Even if now Ethernet cable it plugged in both ports, all CPUs eventually become idle - after Node-RED completed the startup.

rFond commented 1 year ago

I build last version of V01.03.01. Problem come with V01.03.01 and PG FS3 for exemple. I have not yet tested with the FS01 / FS02 / FS04 version but I think it's the same.

jan-kiszka commented 1 year ago

Could you check if the issue was fixed for you with the version of current master?

rFond commented 1 year ago

Was the last edit on the master 3 weeks ago? I have take master 2 week ago.

I have no problem with PG2 FS5 i have tested.

jan-kiszka commented 1 year ago

You just wrote you took V01.03.01, and that is not master and was not master even 2 weeks ago.

rFond commented 1 year ago

I do this : git clone https://github.com/siemens/meta-iot2050.git cd meta-iot2050 ./kas-docker --isar build kas-iot2050-example.yml so I get master version no ?

I think's I have the last version.

Sorry for confusion.

rFond commented 1 year ago

For information, I have tested with new last firmware but same :

siemens@iot2050-debian:~$ sudo fw_printenv fw_version
fw_version=2022.01-V01.03.01.01-0-gffc3caf
siemens@iot2050-debian:~$ sudo top
top - 14:58:37 up 5 min,  2 users,  load average: 2.47, 2.02, 0.98
Tasks: 218 total,   3 running, 214 sleeping,   0 stopped,   1 zombie
%Cpu(s):  3.6 us, 38.2 sy,  0.0 ni, 52.9 id,  0.1 wa,  4.4 hi,  0.8 si,  0.0 st
MiB Mem :   1933.9 total,    428.0 free,    699.9 used,    805.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1152.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     33 root      20   0       0      0      0 R 100.0   0.0   4:49.11 kworker/1:1+events_long
   2990 root      20   0  109732    604    508 S  23.8   0.0   1:06.70 lora_pkt_fwd
    100 root     -51   0       0      0      0 S  10.9   0.0   0:31.00 irq/17-40b00000
   5216 siemens   20   0  251152  28364  22788 R   6.9   1.4   0:00.21 node
   3532 70        20   0  171604  14576  11476 S   1.3   0.7   0:02.28 postgres
   4212 root      20   0   10416   3380   2624 R   1.3   0.2   0:01.68 top
    377 root      20   0 1566004  57372  27368 S   1.0   2.9   0:02.09 containerd
    523 root      20   0 3163640  89956  43240 S   1.0   4.5   0:07.93 dockerd
   3184 nobody    20   0 1087596  24316  17588 S   1.0   1.2   0:02.04 chirpstack
    192 root      20   0       0      0      0 I   0.7   0.0   0:01.88 kworker/0:2-events
    401 siemens   20   0   16504   8704   6948 S   0.7   0.4   0:02.19 systemd
   1726 systemd+  20   0   37416  12524   4480 S   0.7   0.6   0:01.87 redis-server
   1775 nobody    20   0  715328  19580  10276 S   0.7   1.0   0:01.01 chirpstack-gate
   2184 root      20   0  719928  10088   7432 S   0.7   0.5   0:00.30 containerd-shim
   3042 siemens   20   0  387252 126936  33604 S   0.7   6.4   0:27.95 node-red
   3819 siemens   20   0    6860   2912   2644 S   0.7   0.1   0:01.07 bash

My linux version : Linux iot2050-debian 5.10.162-cip24 #1 SMP PREEMPT Thu, 01 Jan 1970 01:00:00 +0000 aarch64

Debug (don't work) : siemens@iot2050-debian:~$ dmesg | grep pru [ 0.000000] Kernel command line: root=PARTUUID=8884f2fe-d9ee-4bd1-93ad-99b6abb9fcf2 console=ttyS3,115200n8 earlycon=ns16550a,mmio32,0x02810000 mtdparts=47040000.spi .0:512k(ospi.tiboot3),2m(ospi.tispl),4m(ospi.u-boot),128k(ospi.env),128k(ospi.env.backup),1m(ospi.sysfw),64k(pru0-fw),64k(pru1-fw),64k(rtu0-fw),64k(rtu1-fw),-@8m(ospi. rootfs) rootwait [ 2.925164] 0x0000007c0000-0x0000007d0000 : "pru0-fw" [ 2.931214] 0x0000007d0000-0x0000007e0000 : "pru1-fw" [ 7.607837] remoteproc remoteproc2: b034000.pru is available [ 7.667188] remoteproc remoteproc4: b038000.pru is available [ 7.704292] remoteproc remoteproc6: b134000.pru is available [ 7.749982] remoteproc remoteproc8: b138000.pru is available [ 7.790562] remoteproc remoteproc10: b234000.pru is available [ 7.816184] remoteproc remoteproc12: b238000.pru is available [ 8.601577] icssg-prueth icssg0-eth: TI PRU ethernet driver initialized: dual EMAC mode [ 9.240180] icssg-prueth icssg0-eth eno1: renamed from eth1 [ 9.273962] icssg-prueth icssg0-eth eno2: renamed from eth0 [ 9.369520] remoteproc remoteproc4: powering up b038000.pru [ 9.380783] remoteproc remoteproc4: Booting fw image ti-pruss/am65x-pru1-prueth-fw.elf, size 17008 [ 9.390046] remoteproc remoteproc4: remote processor b038000.pru is now up [ 9.403506] remoteproc remoteproc5: Booting fw image ti-pruss/am65x-rtu1-prueth-fw.elf, size 15588 [ 9.465168] remoteproc remoteproc2: powering up b034000.pru [ 9.476225] remoteproc remoteproc2: Booting fw image ti-pruss/am65x-pru0-prueth-fw.elf, size 16992 [ 9.485873] remoteproc remoteproc2: remote processor b034000.pru is now up [ 9.505019] remoteproc remoteproc3: Booting fw image ti-pruss/am65x-rtu0-prueth-fw.elf, size 15588 [ 10.430368] icssg-prueth icssg0-eth eno1: Link is Up - 100Mbps/Full - flow control off

Debug (work) : siemens@iot2050-debian:~$ dmesg | grep pru [ 6.764369] remoteproc remoteproc2: b034000.pru is available [ 6.815068] remoteproc remoteproc4: b038000.pru is available [ 6.839242] remoteproc remoteproc6: b134000.pru is available [ 6.993316] remoteproc remoteproc8: b138000.pru is available [ 7.009502] remoteproc remoteproc10: b234000.pru is available [ 7.109100] remoteproc remoteproc12: b238000.pru is available [ 7.506159] icssg-prueth icssg0-eth: TI PRU ethernet driver initialized: dual EMAC mode [ 7.703518] icssg-prueth icssg0-eth eno2: renamed from eth0 [ 7.737291] icssg-prueth icssg0-eth eno1: renamed from eth1 [ 7.828016] remoteproc remoteproc2: powering up b034000.pru [ 7.838803] remoteproc remoteproc2: Booting fw image ti-pruss/am65x-pru0-prueth-fw.elf, size 16992 [ 7.848003] remoteproc remoteproc2: remote processor b034000.pru is now up [ 7.861207] remoteproc remoteproc3: Booting fw image ti-pruss/am65x-rtu0-prueth-fw.elf, size 15588 [ 7.926520] remoteproc remoteproc4: powering up b038000.pru [ 7.935292] remoteproc remoteproc4: Booting fw image ti-pruss/am65x-pru1-prueth-fw.elf, size 17008 [ 7.947088] remoteproc remoteproc4: remote processor b038000.pru is now up [ 7.960696] remoteproc remoteproc5: Booting fw image ti-pruss/am65x-rtu1-prueth-fw.elf, size 15588 [ 12.096944] icssg-prueth icssg0-eth eno1: Link is Up - 1Gbps/Full - flow control off

jan-kiszka commented 1 year ago

We were able to reproduce. It's a PG1-only issues. Currently bisecting.

jan-kiszka commented 1 year ago

Follow-up question: Are you aware of a version which did not have this issue? Maybe we simply missed that so far.

jan-kiszka commented 1 year ago

Nope, regression. Something in 13985e35450004e1b6396176ca92f1c52174c246 is apparently causing this. Current stable/V01.03 (a91e78e5cc41ea49df4d2b2a9b71171115de3af4) is fine, older versions then very likely as well.

jan-kiszka commented 1 year ago

The issue is caused by https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/drivers/net/ethernet/ti/icssg_ethtool.c?h=ti-linux-5.10.y&id=24f3ef66a2c69b945db46a97ecea8a7111251b38 which we carry in our queue as well: On SR1.0 (PG1), emac->speed is 0, and queue_delayed_work() is always started with 0 ms delay (don't ask me why we don't see division by zero bug instead). Discussing with TI how to proceed best.