linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.31k stars 1.01k forks source link

irqbalance01 reports broken - RHEL9 #1118

Closed mpw5421 closed 8 months ago

mpw5421 commented 8 months ago

I opened a bugzilla against RedHat for the irqbalance01 testcase on RHEL9.3, but since the testcase is reported as broken I was instructed to open an issue against LTP instead. Here is the testcase output from a RHEL9.3 system:

./irqbalance01
tst_test.c:1709: TINFO: LTP version: 20230929-260-g3ffbf543a
tst_test.c:1593: TINFO: Timeout per run is 0h 00m 30s
irqbalance01.c:129: TINFO: Found 16 CPUS, 12 IRQs
  IRQ       CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15
    0:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+
    0:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+
    0:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+
    3:       128+       138+       448+       313+        46+        64+       290+       426+       228+       183+        83+        57+       100+       100+       779+        52+
    4:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         2+
    5:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         1+         1+
    6:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         1+         1+
    7:         0+         0+         0+         0+         0+         0+         0+         1+         0+         1+         0+         0+         0+         0+         1+         1+
    8:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         1+         1+
    9:        34+        38+       139+       102+         9+        16+        83+       116+        75+        67+        34+        23+        40+        24+       282+        15+
   10:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         1+         1+
   11:         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         0+         1+         1+
Total:       162        176        587        415         55         80        373        543        303        251        117         80        140        124       1067         75
irqbalance01.c:129: TINFO: Found 16 CPUS, 12 IRQs
realloc(): invalid next size
tst_test.c:1653: TBROK: Test killed by SIGIOT/SIGABRT!
pevik commented 8 months ago

@wangli5665 any change you could have look into it?

metan-ucw commented 8 months ago

@mpw5421 Any chance running this under gdb and getting a backtrace? See: https://github.com/linux-test-project/ltp/#debugging-with-gdb

mpw5421 commented 8 months ago

@metan-ucw I read the link you provided, but wasn't exactly sure how to execute that.

I ran the test manually under strace since I knew how to do that from troubleshooting other LTP testcases I opened bugs for against RedHat. The output is attached.

strace.ltp.irqbalance.txt

jstancek commented 8 months ago

Doesn't reproduce for me on RHEL9.3. Could you try running it under valgrind? (e.g. valgrind ./irqbalance01)

jstancek commented 8 months ago

Also, is this x86_64 or ppc64le?

mpw5421 commented 8 months ago

Also, is this x86_64 or ppc64le?

It's actually s390x on LPAR. Sorry for not including that in the original comment.

Sure, valgrind output attached:

valgrind.ltp.irqbalance.txt

jstancek commented 8 months ago

@mpw5421 Thanks, that looks like we have a bug in test, which fails to parse /proc/interrupts. Could you please attach also output of "cat /proc/interrupts"?

jstancek commented 8 months ago

It appears to be enough to move named interrupts to begging of file to trigger it on x86 as well. "row" variable keeps incrementing and eventually there's a write out-of-bounds:

irqbalance01.c:129: TINFO: Found 8 CPUS, 45 IRQs
realloc(): invalid next size
tst_test.c:1653: TBROK: Test killed by SIGIOT/SIGABRT!
mpw5421 commented 8 months ago

Output from cat /proc/interrupts:

interrupts.txt

jstancek commented 8 months ago

Thanks, that confirms my suspicion. It's a bit too late today, so this could be wrong, but I was thinking:

@@ -154,7 +154,6 @@ static void collect_irq_info(void)
                        if (acc != -1)
                                tst_brk(TBROK, "Unexpected EOL");
                        col = 0;
-                       row++;
                        break;
                case '0' ... '9':
                        if (acc == -1)
@@ -167,6 +166,7 @@ static void collect_irq_info(void)
                        if (acc == -1 || col != 0)
                                tst_brk(TBROK, "Unexpected ':'");
                        irq_ids[row] = acc;
+                       row++;
                        acc = -1;
                        break;
                default:

CC @richiejp

richiejp commented 8 months ago

To be honest I can't remember how it works and I'm not working on LTP at the moment. However it sounds plausible because the parser made a lot of assumptions about the file format.

jstancek commented 8 months ago

v1 patch posted to ML: https://lists.linux.it/pipermail/ltp/2024-January/036666.html