llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.51k stars 11.3k forks source link

[Subregliveness] Bad machine code: Defining instruction does not modify register #40181

Open JonPsson opened 5 years ago

JonPsson commented 5 years ago
Bugzilla Link 40835
Version trunk
OS Linux
Attachments reduced testcase
CC @kparzysz-quic,@JonPsson,@qcolombet,@uweigand

Extended Description

llc -mcpu=z13 -O3 tc_subregliv_defnotmod.ll -o out.s -systemz-subreg-liveness -verify-machineinstrs

After Simple Register Coalescing

** INTERVALS ** %0 [144r,384r:0) 0@144r L00000008 [144r,384r:0) 0@144r L00000001 [144r,384r:0) 0@144r weight:0.000000e+00 %1 [16r,32r:0) 0@16r weight:0.000000e+00 %2 [48r,64r:0) 0@48r weight:0.000000e+00 %4 [176r,192r:0)[192r,224r:1)[224r,240r:2) 0@176r 1@192r 2@224r L00000008 [176r,192r:0)[192r,224r:1)[224r,240r:2) 0@176r 1@192r 2@224r L00000001 [176r,176d:0)[192r,192d:1) 0@176r 1@192r weight:0.000000e+00 %6 [240r,256r:0) 0@240r weight:0.000000e+00 %7 [256r,272r:0) 0@256r weight:0.000000e+00 %10 [288r,304r:0)[304r,384r:1)[384r,432r:2) 0@288r 1@304r 2@384r L00000008 [288r,384r:0)[384r,432r:1) 0@288r 1@384r 2@x L00000001 [288r,384r:0)[384r,384d:1) 0@288r 1@384r 2@x L00000006 [304r,384r:0)[384r,384d:1) 0@304r 1@384r weight:0.000000e+00 RegMasks: ** MACHINEINSTRS **

Machine code for function main: NoPHIs, TracksLiveness

0B bb.0 (%ir-block.0): successors: %bb.2(0x7fffffff), %bb.1(0x00000001); %bb.2(100.00%), %bb.1(0.00%)

16B %1:addr64bit = LARL @​g_48 32B MVI %1:addr64bit, 0, 43 :: (store 1 into @​g_48, align 2, !tbaa !​1) 48B %2:grx32bit = LHIMux 0 64B CHIMux %2:grx32bit, 0, implicit-def $cc 80B BRC 14, 6, %bb.2, implicit killed $cc 96B J %bb.1

112B bb.1 (%ir-block.3): ; predecessors: %bb.0

128B bb.2 (%ir-block.4): ; predecessors: %bb.0 successors: %bb.4(0x7fffffff), %bb.3(0x00000001); %bb.4(100.00%), %bb.3(0.00%)

144B %0:gr64bit = LGHI 43 176B %4:gr64bit = LGHI 43 192B %4.subreg_l32:gr64bit = MSR %4.subreg_l32:gr64bit(tied-def 0), %4.subreg_l32:gr64bit 224B %4.subreg_l32:gr64bit = AHIMux %4.subreg_l32:gr64bit(tied-def 0), 9, implicit-def dead $cc 240B %6:gr32bit = LLCRMux %4.subreg_l32:gr64bit 256B %7:gr32bit = nsw LCR %6:gr32bit, implicit-def dead $cc 272B STRL %7:gr32bit, @​g_80 :: (store 4 into @​g_80, !tbaa !​4) 288B undef %10.subreg_l64:gr128bit = LGFI -245143785 304B %10.subreg_h64:gr128bit = LLILL 0 384B %10:gr128bit = DLGR %10:gr128bit(tied-def 0), %0:gr64bit 432B CHIMux %10.subreg_l32:gr128bit, 0, implicit-def $cc 448B BRC 14, 8, %bb.4, implicit killed $cc 464B J %bb.3

480B bb.3 (%ir-block.16): ; predecessors: %bb.2

496B bb.4 (%ir-block.17): ; predecessors: %bb.2

512B Return

End machine code for function main.

Bad machine code: Defining instruction does not modify register

JonPsson commented 4 years ago

reduced testcase I saw this assert again this week in testing - attaching a reduction of it.

llc -mcpu=z13 -O3 tc_does_not_mod_reg.ll -systemz-subreg-liveness -verify-misched

Bad machine code: Defining instruction does not modify register

JonPsson commented 5 years ago

unreduced test case, llc input (new)

JonPsson commented 5 years ago

reduced testcase (new) This assert triggered again this week...

llc ./tc_defins_nomod.ll -mtriple=s390x-linux-gnu -mcpu=z13 -verify-machineinstrs -systemz-subreg-liveness

qcolombet commented 5 years ago

Landed in r357032

qcolombet commented 5 years ago

I've updated https://reviews.llvm.org/D59731 with the fix for the phis and the new test case.

Thanks Jonas!

qcolombet commented 5 years ago

Created attachment 21660 [details] reduced testcase, for seg fault

It seems that the patch solves the reduced test case, but when I tried to compile the original unreduced input (which without the patch still triggers the assert), a segmentation fault occurs.

bin/llc -mcpu=z13 -O3 tc_subreg_segflt.ll -o out.s -systemz-subreg-liveness Stack dump:

  1. Program arguments: bin/llc -mcpu=z13 -O3 tc_subreg_segflt.ll -o out.s -systemz-subreg-liveness
  2. Running pass 'Function Pass Manager' on module 'tc_subreg_segflt.ll'.
  3. Running pass 'Simple Register Coalescing' on function '@main' ... Segmentation fault (core dumped)

That's a silly mistake. When looking for the MI for the definition of the current VNI, I forgot to check that the VNI is not a PHI. When that happens, the MI is nullptr, since PHIs have been eliminated at this point.

Fixing.

qcolombet commented 5 years ago

Created attachment 21660 [details] reduced testcase, for seg fault

It seems that the patch solves the reduced test case, but when I tried to compile the original unreduced input (which without the patch still triggers the assert), a segmentation fault occurs.

bin/llc -mcpu=z13 -O3 tc_subreg_segflt.ll -o out.s -systemz-subreg-liveness Stack dump:

  1. Program arguments: bin/llc -mcpu=z13 -O3 tc_subreg_segflt.ll -o out.s -systemz-subreg-liveness
  2. Running pass 'Function Pass Manager' on module 'tc_subreg_segflt.ll'.
  3. Running pass 'Simple Register Coalescing' on function '@main' ... Segmentation fault (core dumped)

Thanks Jonas for trying. Looking.

JonPsson commented 5 years ago

unreduced test case, llc input bin/llc -mcpu=z13 -O3 subregliv_0223.ll -o out.s -systemz-subreg-liveness

JonPsson commented 5 years ago

reduced testcase, for seg fault It seems that the patch solves the reduced test case, but when I tried to compile the original unreduced input (which without the patch still triggers the assert), a segmentation fault occurs.

bin/llc -mcpu=z13 -O3 tc_subreg_segflt.ll -o out.s -systemz-subreg-liveness Stack dump:

  1. Program arguments: bin/llc -mcpu=z13 -O3 tc_subreg_segflt.ll -o out.s -systemz-subreg-liveness
  2. Running pass 'Function Pass Manager' on module 'tc_subreg_segflt.ll'.
  3. Running pass 'Simple Register Coalescing' on function '@main' ... Segmentation fault (core dumped)
qcolombet commented 5 years ago

Note: Aside from the verifier complaining, this bug would not create any incorrect code generation. It basically keeps more values around for no good reason, potential overconstraining the allocation.

qcolombet commented 5 years ago

Posted a patch in https://reviews.llvm.org/D59731.

Jonas, please give it a try to make sure it solves your initial problem.

qcolombet commented 5 years ago

The problem comes from the way we update the live-ranges.

Essentially, we have a non-subreg live range let us call it L1. L1 gets merged with a bigger register and it gets updated into: L1 with LaneMask 9 (8 for the low part and 1 for the high part). So far so good. Next we merge L1 with another live range that extend the low part (mask 8). As a result, we break the live-range of L1 into the low and high part.

The problem is when doing that, we keep all the vals from the originallive-range in both subrange whereas the high part is defined only by one val.

Graphically this looks like this:

A: L1(val0, low, high) = fulldef B: L1(val1, low) = lowdef val0.low C: L1(val2, low) = copy val1.low

The expected sublive-range for L1 are: low-lane: val0 [A,B); val1 [B,C); val2 [C,...) high-lane: val0 [A,A+)

Whereas what we get: low-lane: val0 [A,B); val1 [B,C); val2 [C,...) high-lane: val0 [A,A+), val1 [B,C) <-- val1 shouldn't be here

We end up in this situation with though steps: high and low-lanes: val0 [A,B); val1 [B,C) Then we merge val2 and the problem arise.

As for the fix, either we don't try to be smart when updating the subrange (i.e., putting everything in high and low lanes is correct) or we need to make sure we only leave things defined for a given lane when "splitting" the subranges.

To keep the benefits of subreg liveness, we need to do the latter.

qcolombet commented 5 years ago

MIR reduced test case Attaching a smaller test case that reproduces the problem. llc -mtriple s390x-ibm-linux -mcpu=z13 subreg_liveness.mir -systemz-subreg-liveness -verify-machineinstrs -start-before simple-register-coalescing

qcolombet commented 5 years ago

Haven't looked into the details yet, but the IR seems sensible so either the verifier is broken or we somewhat did set the lane mask properly when updating the live-ranges.

qcolombet commented 5 years ago

Looking.

kparzysz-quic commented 5 years ago

I'm swamped with other stuff for the next couple of weeks. If nobody else takes this, I should be able to take a look after I get some time.

JonPsson commented 5 years ago

ping!

Still coming up in testing. Original test case still applies.

JonPsson commented 5 years ago

ping!

Saw this again in csmith testing.

JonPsson commented 5 years ago

assigned to @qcolombet

arsenm commented 1 year ago

Latest posted testcase still fails:

; llc -mtriple=systemz-- -mcpu=z13 -O3 -systemz-subreg-liveness -verify-machineinstrs < %s

@g_19 = external dso_local global [1 x i32], align 4
@g_90 = external dso_local global i32*, align 8
@g_57 = external dso_local global [8 x [4 x [4 x i32*]]], align 8

define void @fun() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb
  store i16 28549, i16* undef, align 2
  %i = xor i16 0, 1
  br i1 icmp eq (i32** getelementptr inbounds ([8 x [4 x [4 x i32*]]], [8 x [4 x [4 x i32*]]]* @g_57, i64 0, i64 5, i64 1, i64 3), i32** @g_90), label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  unreachable

bb3:                                              ; preds = %bb1
  br label %bb4

bb4:                                              ; preds = %bb3
  %i5 = sext i16 %i to i64
  %i6 = or i64 %i5, 4966182774046565288
  %i7 = trunc i64 %i6 to i32
  br i1 undef, label %bb8, label %bb9

bb8:                                              ; preds = %bb4
  store i32 %i7, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @g_19, i64 0, i64 0), align 4
  unreachable

bb9:                                              ; preds = %bb4
  %i10 = sdiv i64 %i6, %i5
  %i11 = trunc i64 %i10 to i32
  store i32 %i11, i32* getelementptr inbounds ([1 x i32], [1 x i32]* @g_19, i64 0, i64 0), align 4
  unreachable
}