quic / toolchain_for_hexagon

Other
18 stars 11 forks source link

qemu 'signal' testcase fails w/sigsegv #6

Closed androm3da closed 1 year ago

androm3da commented 2 years ago

Reported by Richard Henderson on qemu-devel list https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01102.html

Context is that QEMU tried to switch to a new Hexagon toolchain and found a regression. The previous/known good hexagon toolchain was built using LLVM ~12.0.0 sources and a toolchain build recipe defined inside of qemu repo.

This toolchain build recipe was altered in some ways (for example, we switched to -O2 for libc artifact build) when we produced the new toolchain - https://github.com/quic/toolchain_for_hexagon/releases/tag/v13.0.0

I've had a closer look at the signals failure, and it really could be a toolchain problem.

The sigsegv is at

#0  0x00005555557a387f in stb_p (ptr=0x10000, v=0 '\000')
    at /home/richard.henderson/qemu/src/include/qemu/bswap.h:326
#1  0x00005555557a4bc5 in cpu_stb_mmu (env=0x555555e4eb50, addr=0,
    val=0 '\000', oi=0, ra=93824992935986) at ../src/accel/tcg/user-exec.c:359
#2  0x00005555557a5396 in cpu_stb_mmuidx_ra (env=0x555555e4eb50, addr=0,
    val=0, mmu_idx=0, ra=93824992935986)
    at ../src/accel/tcg/ldst_common.c.inc:83
#3  0x00005555557a57e6 in cpu_stb_data_ra (env=0x555555e4eb50, addr=0, val=0,
    ra=93824992935986) at ../src/accel/tcg/ldst_common.c.inc:183
#4  0x00005555555ff6f0 in helper_commit_store (env=0x555555e4eb50, slot_num=1)
    at ../src/target/hexagon/op_helper.c:151
#5  0x0000555555600032 in check_noshuf (env=0x555555e4eb50, slot=0)
    at ../src/target/hexagon/op_helper.c:407
#6  0x00005555556000e4 in mem_load4 (env=0x555555e4eb50, slot=0, vaddr=305000)
    at ../src/target/hexagon/op_helper.c:431
#7  0x00005555556063c0 in helper_L2_loadri_io (env=0x555555e4eb50, RsV=305000,
    siV=0, slot=0) at target/hexagon/helper_funcs_generated.c.inc:1013
#8  0x00007fffe8034f5a in code_gen_buffer ()

which is a store to address 0, which obviously should fail.

This comes from

IN: nontrivial_free
0x000224c4:  0x78004003 {       R3 = #0x0
0x000224c8:  0xf204d001         P1 = cmp.eq(R4,R16) }
0x000224cc:  0x5c00413e {       if (P1) jump:nt PC+124
0x000224d0:  0x38034000         if (P0) memb(R3+#0x0) = #0x0
0x000224d4:  0x9180c002         R2 = memw(R0+#0x0) }

which is part of the new toolchain's libc. This is quite obviously a store to address 0 if P0 is true. Which looks pretty questionable. Presumably P0 is not always set, which is why the program does not always crash. But there doesn't appear to be anything wrong with the qemu translation.

I'm suspicious of the new compiler. This looks like some sort of code scheduling bug, where R3=0 got moved ahead of the final use of the previous value in R3.

In the short term, I recommend dropping the hexagon toolchain update and that Taylor generate a new HVX pull request with the new tests present but disabled in the makefile.

mukilan-quic commented 1 year ago

The fix for this issue has been posted to upstream https://mail.gnu.org/archive/html/qemu-devel/2022-12/msg04584.html

Once it is merged, it should be safe to switch the libc build back to -O2.