llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.25k stars 12.08k forks source link

LDS lowering should survive absolute addresses #81491

Open JonChesterfield opened 9 months ago

JonChesterfield commented 9 months ago

Splitting from https://github.com/llvm/llvm-project/pull/75333

Created a task to track a suspicion that superAlignLDSGlobals is miscompiling absolute address variables at present.

Idea is to accept absolute addressed LDS on the input if and only if all variables have absolute addresses. I think this will unblock @Pierre-vh's thinlto project

Currently an absolute symbol is detected at L350 and fatal_error'ed, probably after setting alignment metadata on it which could cause trouble later.

JonChesterfield commented 9 months ago

A few dubious things in the code - report_fatal_error("Anonymous kernels cannot use LDS variables"); <- that probably isn't true any more - I think I got diverted to some other task partway through changing the lowering scheme to be composable with promote-alloca and there's some cleanup outstanding

llvmbot commented 9 months ago

@llvm/issue-subscribers-backend-amdgpu

Author: Jon Chesterfield (JonChesterfield)

Splitting from https://github.com/llvm/llvm-project/pull/75333 Created a task to track a suspicion that superAlignLDSGlobals is miscompiling absolute address variables at present. Idea is to accept absolute addressed LDS on the input if and only if all variables have absolute addresses. I think this will unblock @Pierre-vh's thinlto project
arsenm commented 9 months ago

I'm also still quite interested in getting AMDGPUPromoteAlloca to use a pre-allocated block of LDS instead of requiring running first

JonChesterfield commented 9 months ago

Oh, you'd like to run promotealloca before lowerlds? That'll cost an extra callgraph walk I think. Currently lowerlds runs first so promotealloca knows how much LDS is allocated by a given kernel (cheaply).

What does running promotealloca first win? Interesting one is if it puts extra LDS allocations in non-kernel functions, the lowering pass will fix that up. That'll work really nicely for functions that are only reachable from one kernel and somewhat well for others.

Regarding this patch,

       if (GV.isAbsoluteSymbolRef()) {
-        report_fatal_error(
-            "LDS variables with absolute addresses are unimplemented.");
+        // Address already assigned.
+        continue;
       }

That's probably equivalent to cleverer solutions. The idea is to treat variables with an absolute address as already allocated (and then ignore them). It'll work for the thinlto case but will tend to alias unrelated variables when run outside of it.

I'm considering changing the allocator to be bin packing, that would be inherently incremental.

(Promotealloca can be simpler than it is at the moment in terms of allocating - all it needs to do is increment the kernel attribute and give the new LDS block an absolute symbol and the backend will do the right thing - the current setup is essentially a bump allocator with sparse documentation)

arsenm commented 9 months ago

Oh, you'd like to run promotealloca before lowerlds? That'll cost an extra callgraph walk I think. Currently lowerlds runs first so promotealloca knows how much LDS is allocated by a given kernel (cheaply).

Not necessarily (although being able to freely reorder them would be a nice property). I'm first interested in fixing up the cruft of the current scheme. The module lowering pass precisely determines the LDS usage and merges everything into the one global. The promotion pass then separately figures out the LDS usage by inspecting the GlobalVariable uses (and then sorts conservatively). We then create another global variable on the side, separate from the one mega global.

I'm also thinking about trying to extend the promotion to variables outside of kernels.