Open lfmeadow opened 5 months ago
@llvm/issue-subscribers-backend-amdgpu
Author: Larry Meadows (lfmeadow)
cc @nikic
For SimplifyCFG the problem is basically that, when considering sinking for each instruction individually, it would create many phi nodes. However, if we sink multiple instructions, it turns out that most of the phi nodes are actually not needed (because the phi operands are also sunk instructions). It should be possible to teach SimplifyCFG about that.
Hm actually it looks like there's already a check for this here: https://github.com/llvm/llvm-project/blob/86bb5c8427346aafaafa42fbf96e405ae4ca07bf/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L2342 So the issue must be something else...
I think the actual problem may be the one-use check in: https://github.com/llvm/llvm-project/blob/86bb5c8427346aafaafa42fbf96e405ae4ca07bf/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L1957-L1961 Some of the instructions also have an extra use in a phi node in the sink block, so they'll not be considered sinking candidates. Not immediately obvious to me why we need the one-use restriction.
Reduced test case to illustrate the problem: https://llvm.godbolt.org/z/q6Eqdj8K7
define ptr @test(i1 %c, ptr %p, i64 %a, i64 %b) {
br i1 %c, label %if, label %else
if:
call void @dummy()
%gep1.a = getelementptr i8, ptr %p, i64 %a
%gep2.a = getelementptr i8, ptr %gep1.a, i64 %b
br label %join
else:
%gep1.b = getelementptr i8, ptr %p, i64 %a
%gep2.b = getelementptr i8, ptr %gep1.b, i64 %b
br label %join
join:
%phi1 = phi ptr [ %gep1.a, %if ], [ %gep1.b, %else ]
%phi2 = phi ptr [ %gep2.a, %if ], [ %gep2.b, %else ]
call void @use(ptr %phi1)
ret ptr %phi2
}
declare void @dummy()
declare void @use(ptr)
Produces:
define ptr @test(i1 %c, ptr %p, i64 %a, i64 %b) {
br i1 %c, label %if, label %else
if: ; preds = %0
call void @dummy()
%gep1.a = getelementptr i8, ptr %p, i64 %a
br label %join
else: ; preds = %0
%gep1.b = getelementptr i8, ptr %p, i64 %a
br label %join
join: ; preds = %else, %if
%gep1.b.sink = phi ptr [ %gep1.b, %else ], [ %gep1.a, %if ]
%phi1 = phi ptr [ %gep1.a, %if ], [ %gep1.b, %else ]
%gep2.b = getelementptr i8, ptr %gep1.b.sink, i64 %b
call void @use(ptr %phi1)
ret ptr %gep2.b
}
This one is even extra bad because we end up with two identical phis that block the sinking that would remove those phis...
Preparatory PR: https://github.com/llvm/llvm-project/pull/94462 Once that lands, I have another patch to enable sinking of instructions with multiple uses.
FYI The preparatory PR did not fix the problem in the real code; it also still complains about 'too many PHI' . I'd be glad to test your second patch when I can get it.
@lfmeadow The second patch is https://github.com/nikic/llvm-project/commit/baeab1f0db6b0c18dfa87a7abf87d8a675b308be. I haven't tried it on your test case though.
@lfmeadow The second patch is nikic@baeab1f. I haven't tried it on your test case though.
Thank you. The two patches do fix the performance problem in two of our three fwd/bwd FFT pairs. The third pair still performs 8% worse; that problem is not related to simplifycfg.
The SimplifyCFG issue should be fixed by https://github.com/llvm/llvm-project/pull/95521. The other problem looks like something for AMDGPU backend developers to look at. From the description it sounds like rematerialization heuristics may need to be improved, but I know little about that.
@llvm/issue-subscribers-backend-amdgpu
Author: Larry Meadows (lfmeadow)
I wonder if using gvnsink would have addressed this?
AMD have traced some significant (8-20%) slowdowns on several of AMD's rocFFT kernels to the commit mentioned in https://github.com/llvm/llvm-project/issues/78214 . The llvm commit is https://github.com/llvm/llvm-project/commit/e13bed4c5f3544c076ce57e36d9a11eefa5a7815. Briefly, we have identified two problems:
Ideally the GEP rewrite commit would be withdrawn. I don't see any way to fix this otherwise. I do not understand why simplifycfg is failing to restructure this code.
Thanks for your consideration