llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.05k stars 11.98k forks source link

[Statepoint-VReg] Cornercase bug with multiple uses of a single value #46261

Closed preames closed 4 years ago

preames commented 4 years ago
Bugzilla Link 46917
Resolution FIXED
Resolved on Oct 14, 2020 07:45
Version trunk
OS Linux
CC @dantrushin

Extended Description

There is a hard to handle cornercase in our experimental vreg statepoint lowering. I had briefly described the problem in https://reviews.llvm.org/D81648?vs=on&id=277132#2146051 as such: "Second, there appears to be a semantic problem around the handling of base vs derived slots unless we always spill the base. We can't tie both uses to a single def. This may warrant some offline discussion."

(I decided to land that patch with the issue unresolved to unblock progress since review discussion was fragmented and hard to follow.)

Let me expand a bit on the issue. Say we have an example like the following: target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu"

declare void @​foo() declare void @​consume(i8 addrspace(1)*)

define void @​test(i8 addrspace(1) %a) gc "statepoint-example" { entry: %a_gep = getelementptr i8, i8 addrspace(1) %a, i64 8 %safepoint_token = tail call token (i64, i32, void (), i32, i32, ...) @​llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () @​foo, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i8 addrspace(1) %a, i8 addrspace(1) %a_gep)] %a1 = call i8 addrspace(1) @​llvm.experimental.gc.relocate.p1i8(token %safepoint_token, i32 0, i32 1) call void @​consume(i8 addrspace(1) %a1) ret void }

declare token @​llvm.experimental.gc.statepoint.p0f_i1f(i64, i32, i1 (), i32, i32, ...) declare token @​llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32, void (), i32, i32, ...) declare i8 addrspace(1)* @​llvm.experimental.gc.relocate.p1i8(token, i32, i32)

In this example, we generate a statepoint MI node which looks like this: %6:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, 1, 8, %stack.0, 0, killed %7(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp :: (volatile load store 8 on %stack.0)

(Where %7 is the GEP, and %stack.0 is the spill for the base)

Imagine we had instead generated: %6:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, 1, 0, %7, 0, killed %7(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp :: (volatile load store 8 on %stack.0) (That is, replaced stack with another usage of %7.)

This would be incorrect. Why? Because the second use of %7 can not be tied to the %6 def. As a result, the fact that the GC may need to update the base - remember objects may move many times during a call - is lost. The register allocator is free to assign different locations to the the two uses of %0, and then assume one of them is read only. That would be a miscompile.

To say all that different, we tie operands, not registers.

Unfortunately, the very slightly tweaked example produces exactly this effect: define void @​test(i8 addrspace(1) %a) gc "statepoint-example" { entry: %a_gep = getelementptr i8, i8 addrspace(1) %a, i64 8 %safepoint_token = tail call token (i64, i32, void (), i32, i32, ...) @​llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () @​foo, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i8 addrspace(1) %a, i8 addrspace(1) %a_gep)] %a1 = call i8 addrspace(1) @​llvm.experimental.gc.relocate.p1i8(token %safepoint_token, i32 0, i32 0) call void @​consume(i8 addrspace(1) %a1) ret void }

This produces: %1:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, %0, %0(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp

Which is WRONG.

One possible correct alternative would be: %1:gr64, %2:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, %0(tied-def 0), %0(tied-def 1), csr_64, implicit-def $rsp, implicit-def $ssp

Note that while I've given the example in terms of a single base/derived pair, you can also construct the same problematic pattern by relocating a single object twice (or otherwise having multiple gc operand uses of the same value).

The two major families of fixes I see are: 1) Produce a separate tied def for each use copy.
2) Produce at most one use of each value - thus, eagerly spilling further uses.

The second seems like a hard invariant to preserve during optimization, so I'd tend towards the first.

The first does result in slightly poor codegen since we can end up spilling the same value twice. I don't have a simple fix for that. We could potentially change the statepoint MI format to remove the dual use, but that's a bit involved.

dantrushin commented 4 years ago

D87154 fixes this problem by listing every pointer operand only once. base/derived relation is enconded by following integer arguments (indices into that pointer list)

preames commented 4 years ago

I also think that all above means that from code generator point of view (correctness), in every single (base, derived) pair, base pointer is irrelevant to codegen. It only is needed for (easier) generation of stack maps later. This is the key mistake. This statement is false.

The base pointer must be available to the GC during the call. Think about a dynamic trace which looks something like this:

call foo w/stackmap {reg, stackslot} // i.e. single base/derived safepoint safepoint return from foo

At the safepoint, the GC will need to have available and update the base pointer value. Without it, it can't correctly relocate the derived pointer.

dantrushin commented 4 years ago

I've been assuming the following holds:

  1. statepoint kills all OOPs; all pointers live across statepoint must be relocated
  2. gc.relocate relocate single pointer; in pair (base, derived) it relocates second (derived) one.

From that I conclude that if base pointer is live across statepoint, it MUST appear as (base, base) in gc args list.

I also think that all above means that from code generator point of view (correctness), in every single (base, derived) pair, base pointer is irrelevant to codegen. It only is needed for (easier) generation of stack maps later.

All above means IMHO that

%1:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, %0, %0(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp

is valid construct as long as tied operands are assigned to the same register and that register is used past statepoint for %0. Even though I've never seen this in practice, I think it is OK for regalloc to assign untied %0 to something else (another register or spill slot) and possibly treat it read only, as you say.

Note that after SSA 'flattening' this will become

%0:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, %0, %0(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp

I don't think regalloc is allowed to perform this assignment:

$r12 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, $r12, $r13(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp

and this would be OK:

$r12 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, $r13, $r12(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp

$r13 will (must) be dead after statepoint. In practice, both %0 arguments are allocated to the same register.

As for

%1:gr64, %2:gr64 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, %0(tied-def 0), %0(tied-def 1), csr_64, implicit-def $rsp, implicit-def $ssp

how would it look like after regalloc? I don't think the following is possible:

$r12, $r12 = STATEPOINT 0, 0, 0, @​foo, 2, 0, 2, 0, 2, 0, $r12(tied-def 0), $r12(tied-def 1), csr_64, implicit-def $rsp, implicit-def $ssp

preames commented 4 years ago

assigned to @dantrushin