Closed Pierre-vh closed 5 months ago
@llvm/pr-subscribers-lto
Author: Pierre van Houtryve (Pierre-vh)
@llvm/pr-subscribers-backend-amdgpu
Author: Pierre van Houtryve (Pierre-vh)
Can you add some tests with aliases? I'm not sure they work correctly in the call graph
I found a bug during the review of #89683. I will attach the test files here. These tests should pass when they are run individually and fail when they are tested as a group. These nondeterministic results show up when AMDGPUSplitModule
or SplitModule
are used for partitioning. I suspect it may have something to do with tie breaks during load balancing.
clone-lds-function.ll.txt clone-lds-function-ancestor-kernels.ll.txt clone-lds-function-successor.ll.txt
These nondeterministic results show up when AMDGPUSplitModule or SplitModule are used for partitioning
I ran into https://llvm.org/docs/CodingStandards.html#beware-of-non-determinism-due-to-ordering-of-pointers a while ago, and thought I'd point it out since I see this work uses sets/maps with pointer keys. Not sure if the iteration on these effects test output, but something to consider..
I found a bug during the review of #89683. I will attach the test files here. These tests should pass when they are run individually and fail when they are tested as a group. These nondeterministic results show up when
AMDGPUSplitModule
orSplitModule
are used for partitioning. I suspect it may have something to do with tie breaks during load balancing.clone-lds-function.ll.txt clone-lds-function-ancestor-kernels.ll.txt clone-lds-function-successor.ll.txt
I'm not able to reproduce it, can you create a testcase that only depends on this pass?
I had to modify the tests a lot to make them work so it possibly lost what makes them non-deterministic. Also I see you're using %u
but I don't know what that's supposed to do, I replaced it with %t to make it work.
Note: I added the isAddressTaken
check for the indirect calls dependency collection. I think that's a trivial addition to narrow down the set of indirectly callable functions.
I found a bug during the review of #89683. I will attach the test files here. These tests should pass when they are run individually and fail when they are tested as a group. These nondeterministic results show up when
AMDGPUSplitModule
orSplitModule
are used for partitioning. I suspect it may have something to do with tie breaks during load balancing.clone-lds-function.ll.txt clone-lds-function-ancestor-kernels.ll.txt clone-lds-function-successor.ll.txt
Never mind, I had some errors in the RUN lines in my tests. This issue does not exist.
:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.
My understanding is that
--lto-partitions
doesn't work because the AMDGPU backend needs to consume the SCC in order. I'm assuming this patch maintains that logic but splits the independent kernel calls up to they can be done in parallel?
Exactly. If a function is called directly or indirectly by a kernel, it stays in that kernel's module.
I've thought that the true solution to resolving this would just be emitting the kernel resource usage inside of
ld.lld
. Presumably that would require emitting resource usage per-function in parallel and then the callgraph information (There's some small support for this already). Thenld.lld
would need to traverse the callgraph to get the diameter of said graph. However I'm assuming that'd take more effort to define an actual linking ABI.
Indeed, and this is where I started looking as well (so we could just enable thinLTO), but it's hard to get right and this solution was by far the easiest and the most realistic one to do in the short term.
Can this be fixed or reverted https://lab.llvm.org/buildbot/#/builders/85/builds/24181 ?
(See #83128 to review first commit)
This enables the --lto-partitions option to work more consistently.
This module splitting logic is fully aware of AMDGPU modules and their specificities and takes advantage of them to split modules in a way that avoids compilation issue (such as resource usage being incorrectly represented).
This also includes a logging system that's more elaborate than just LLVM_DEBUG which allows printing logs to uniquely named files, and optionally with all value names hidden so they can be safely shared without leaking informatiton about the source. Logs can also be enabled through an environment variable, which avoids the sometimes complicated process of passing a -mllvm option all the way from clang driver to the offload linker that handles full LTO codegen.