Open JonChesterfield opened 9 months ago
A few dubious things in the code - report_fatal_error("Anonymous kernels cannot use LDS variables");
<- that probably isn't true any more - I think I got diverted to some other task partway through changing the lowering scheme to be composable with promote-alloca and there's some cleanup outstanding
@llvm/issue-subscribers-backend-amdgpu
Author: Jon Chesterfield (JonChesterfield)
I'm also still quite interested in getting AMDGPUPromoteAlloca to use a pre-allocated block of LDS instead of requiring running first
Oh, you'd like to run promotealloca before lowerlds? That'll cost an extra callgraph walk I think. Currently lowerlds runs first so promotealloca knows how much LDS is allocated by a given kernel (cheaply).
What does running promotealloca first win? Interesting one is if it puts extra LDS allocations in non-kernel functions, the lowering pass will fix that up. That'll work really nicely for functions that are only reachable from one kernel and somewhat well for others.
Regarding this patch,
if (GV.isAbsoluteSymbolRef()) {
- report_fatal_error(
- "LDS variables with absolute addresses are unimplemented.");
+ // Address already assigned.
+ continue;
}
That's probably equivalent to cleverer solutions. The idea is to treat variables with an absolute address as already allocated (and then ignore them). It'll work for the thinlto case but will tend to alias unrelated variables when run outside of it.
I'm considering changing the allocator to be bin packing, that would be inherently incremental.
(Promotealloca can be simpler than it is at the moment in terms of allocating - all it needs to do is increment the kernel attribute and give the new LDS block an absolute symbol and the backend will do the right thing - the current setup is essentially a bump allocator with sparse documentation)
Oh, you'd like to run promotealloca before lowerlds? That'll cost an extra callgraph walk I think. Currently lowerlds runs first so promotealloca knows how much LDS is allocated by a given kernel (cheaply).
Not necessarily (although being able to freely reorder them would be a nice property). I'm first interested in fixing up the cruft of the current scheme. The module lowering pass precisely determines the LDS usage and merges everything into the one global. The promotion pass then separately figures out the LDS usage by inspecting the GlobalVariable uses (and then sorts conservatively). We then create another global variable on the side, separate from the one mega global.
I'm also thinking about trying to extend the promotion to variables outside of kernels.
Splitting from https://github.com/llvm/llvm-project/pull/75333
Created a task to track a suspicion that superAlignLDSGlobals is miscompiling absolute address variables at present.
Idea is to accept absolute addressed LDS on the input if and only if all variables have absolute addresses. I think this will unblock @Pierre-vh's thinlto project
Currently an absolute symbol is detected at L350 and fatal_error'ed, probably after setting alignment metadata on it which could cause trouble later.