rust-lang / rust

Empowering everyone to build reliable and efficient software.

https://www.rust-lang.org

Other

96.66k stars 12.49k forks source link

Match on two-variant enum optimizes poorly #122734

Open clubby789 opened 5 months ago

clubby789 commented 5 months ago

https://godbolt.org/z/f3TMe3rcW

good gets optimal codegen since the LLVM 18 bump, but bad has had poor codegen for some time. Codegen also becomes optimal if a and b are references rather than passed by value.

clubby789 commented 5 months ago

~~Not sure if this is necessarily A-LLVM actually; it looks like we don't give enough information to codegen to improve this~~

We actually do:

%0 = alloca ...
%1 = load i8, ptr %0, align 1, !range <range info>

Somewhere in SROA that range information seems to get lost

erikdesjardins commented 5 months ago

Codegen also becomes optimal if a and b are references

Looks like this is because of the range metadata on the load: https://godbolt.org/z/bPfezGhW6.

So this may be fixed after LLVM 19, since we'll be able to put range metadata on the by-value arguments.

(range on arguments has been merged upstream, but optimization support is being implemented right now, so the following isn't optimized yet, but it may be soon: https://godbolt.org/z/17s6jTfMv)

DianQK commented 5 months ago

122726 should be a similar issue.

DianQK commented 5 months ago

So this may be fixed after LLVM 19, since we'll be able to put range metadata on the by-value arguments.

(range on arguments has been merged upstream, but optimization support is being implemented right now, so the following isn't optimized yet, but it may be soon: https://godbolt.org/z/17s6jTfMv)

@erikdesjardins ~~Are you adding relevant support? Or someone has started.~~

I'm curious why we don't put the range metadata on the non-load instruction.

DianQK commented 5 months ago

Reference: https://github.com/llvm/llvm-project/pull/83171.

erikdesjardins commented 5 months ago

Yeah, that's the original PR that added support, and there have been a few follow ups so far to have different passes check for range attributes. Since there's still quite a while until LLVM 19, I imagine that support will be added to every pass or analysis that currently uses range metadata by that time.

I'm curious why we don't put the range metadata on the non-load instruction.

LLVM generally only checks for such metadata on loads and calls (e.g.), so I think adding it to other instructions won't have much of an effect.

^{The only usage that allows any kind of instruction appears to be here in InstSimplify. It's used only to optimize comparisons that are always true or always false.}

DianQK commented 5 months ago

I'm curious why we don't put the range metadata on the non-load instruction.

LLVM generally only checks for such metadata on loads and calls (e.g.), so I think adding it to other instructions won't have much of an effect.

IIUC, we can add range metadata to other instructions. We just don't do that now. We can quickly get the range of arbitrary values when the source value provides range metadata. Adding range metadata for other instructions doesn't seem necessary if that's correct.

scottmcm commented 5 months ago

Somewhere in SROA that range information seems to get lost

I'm pretty sure that's because we store to the alloca then load from it again. So a very early pass replaces the load-store with just using the argument, but in doing so doesn't preserve the !range metadata anywhere.

It'd need to insert an assume of some sort to keep it, and that has enough other issues that LLVM usually doesn't do that, AFAIK.

DianQK commented 5 months ago

Somewhere in SROA that range information seems to get lost

I'm pretty sure that's because we store to the alloca then load from it again. So a very early pass replaces the load-store with just using the argument, but in doing so doesn't preserve the !range metadata anywhere.

It'd need to insert an assume of some sort to keep it, and that has enough other issues that LLVM usually doesn't do that, AFAIK.

I've tried it on https://github.com/rust-lang/rust/pull/122728. I got a huge change in the perf result. 🫠

scottmcm commented 5 months ago

I got a huge change in the perf result. 🫠

I'd been thinking about the possibility of LLVM doing the !range → assume translation as part of removing the load. But yeah, that may well have the same perf result problem.