llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.8k stars 11.91k forks source link

[InstCombine] Dropping pointless masking before left shift #41908

Open LebedevRI opened 5 years ago

LebedevRI commented 5 years ago
Bugzilla Link 42563
Version trunk
OS Linux
Blocks llvm/llvm-project#41801
CC @davidbolvansky,@huihzhang,@RKSimon,@nikic,@rotateright
Fixed by commit(s) 366540,366539,366538,366537,366536,366535,372629,372630

Extended Description

Splitting off from llvm/llvm-project#41801 just so i can track them separately.

There's a bunch of patterns, even if we only look at 'masking'-like pattern. I guess i'm stumbling into a deep untouched woods that no one had in hotpaths..

a,b: https://rise4fun.com/Alive/4zsf c,d,e,f: https://rise4fun.com/Alive/RC49

LebedevRI commented 5 years ago

And finally, https://reviews.llvm.org/D69125

LebedevRI commented 5 years ago

And there's obviously a case where mask will not be needed: https://rise4fun.com/Alive/UCv https://rise4fun.com/Alive/rVA

Name: a, normal+mask Pre: C1+C2 u>= 16 %C1_wide = zext i16 C1 to i32 %onebit = shl i32 -1, %C1_wide %mask = xor i32 %onebit, -1 %masked = and i32 %mask, %x %truncated = trunc i32 %masked to i16 %r = shl i16 %truncated, C2 => %x_trunc = trunc i32 %x to i16 %r = shl i16 %x_trunc, C2

Name: c, normal+mask Pre: (C2-C1) >= -16 %C1_wide = zext i16 C1 to i32 %t0 = lshr i32 -1, %C1_wide %t1 = and i32 %t0, %x %t2 = trunc i32 %t1 to i16 %r = shl i16 %t2, C2 => %x_trunc = trunc i32 %x to i16 %r = shl i16 %x_trunc, C2

LebedevRI commented 5 years ago

One last thing i will need to check here is whether it's meaningful to handle cases where's trunc in the pattern (before last shift).

Ha, that's nice, it kinda just works. https://rise4fun.com/Alive/elH https://rise4fun.com/Alive/icy

Name: a, normal+mask %onebit = shl i32 -1, C1 %mask = xor i32 %onebit, -1 %masked = and i32 %mask, %x %truncated = trunc i32 %masked to i16 %r = shl i16 %truncated, C2 => %C2_wide = zext i16 C2 to i32 %n0 = shl i32 %x, %C2_wide %n1 = add i32 C1, %C2_wide %n2 = zext i32 %n1 to i64 %n3 = shl i64 -1, %n2 %n4 = xor i64 %n3, -1 %n5 = trunc i64 %n4 to i16 %n6 = trunc i32 %n0 to i16 %r = and i16 %n6, %n5

Name: c, normal+mask %t0 = lshr i32 -1, C1 %t1 = and i32 %t0, %x %t2 = trunc i32 %t1 to i16 %r = shl i16 %t2, C2 => %C2_wide = zext i16 C2 to i32 %n0 = shl i32 %x, %C2_wide %n1_0 = sub i32 %C2_wide, C1 %n1_1 = sub i32 0, %n1_0 %n1 = add i32 %n1_1, 32 %n2 = zext i32 %n1 to i64 %n3 = lshr i64 -1, %n2 %n4 = trunc i64 %n3 to i16 %n5 = trunc i32 %n0 to i16 %r = and i16 %n4, %n5

LebedevRI commented 5 years ago

One last thing i will need to check here is whether it's meaningful to handle cases where's trunc in the pattern (before last shift).

LebedevRI commented 5 years ago

And posted

https://rise4fun.com/Alive/x2Cq Not unexpectedly, same is true for patterns a,b https://rise4fun.com/Alive/Mazd

... and to have nice handling of non-splats, while for patterns c-f this will just work https://rise4fun.com/Alive/Lqj Right, of course it won't needs zext/trunc too https://rise4fun.com/Alive/gslRa

, for a,b

some messing with zext/trunc will be needed https://rise4fun.com/Alive/plgI Hoping to finally look into that now..

And finally, https://reviews.llvm.org/D67677 (patterns a,b only)

And for c,d,e - https://reviews.llvm.org/D67725

LebedevRI commented 5 years ago

https://rise4fun.com/Alive/x2Cq Not unexpectedly, same is true for patterns a,b https://rise4fun.com/Alive/Mazd

... and to have nice handling of non-splats, while for patterns c-f this will just work https://rise4fun.com/Alive/Lqj Right, of course it won't needs zext/trunc too https://rise4fun.com/Alive/gslRa

, for a,b

some messing with zext/trunc will be needed https://rise4fun.com/Alive/plgI Hoping to finally look into that now..

And finally, https://reviews.llvm.org/D67677 (patterns a,b only)

LebedevRI commented 5 years ago

https://rise4fun.com/Alive/x2Cq Not unexpectedly, same is true for patterns a,b https://rise4fun.com/Alive/Mazd

... and to have nice handling of non-splats, while for patterns c-f this will just work https://rise4fun.com/Alive/Lqj, for a,b some messing with zext/trunc will be needed https://rise4fun.com/Alive/plgI Hoping to finally look into that now..

And finally, https://reviews.llvm.org/D67677 (patterns a,b only)

LebedevRI commented 5 years ago

https://rise4fun.com/Alive/x2Cq Not unexpectedly, same is true for patterns a,b https://rise4fun.com/Alive/Mazd

... and to have nice handling of non-splats, while for patterns c-f this will just work https://rise4fun.com/Alive/Lqj, for a,b some messing with zext/trunc will be needed https://rise4fun.com/Alive/plgI Hoping to finally look into that now..

LebedevRI commented 5 years ago

https://rise4fun.com/Alive/x2Cq Not unexpectedly, same is true for patterns a,b https://rise4fun.com/Alive/Mazd

LebedevRI commented 5 years ago

Actually, i'm missing the point here. While indeed if the mask does not touch any bits that will remain after the shift means we can drop the mask, we still can trivially do the fold if it does touch some bits - that legality check also tells us how many high bits to clear.

https://rise4fun.com/Alive/x2Cq Obviously, that won't work if we had ashr.

The mask is constant, so we only need one extra instruction. I guess that means the input to %r needs to be one-use (or %x a constant.)

LebedevRI commented 5 years ago

Not seeing any obvious follow-up improvements for this at the moment. I'm going to consider this resolved. Can be reopened if needed.

LebedevRI commented 5 years ago

Okay, all 6 patterns/patches are up. I've tried to make those as easy for review as possible.

LebedevRI commented 5 years ago

Starting to post patches, top-of-stack: https://reviews.llvm.org/D64512

LebedevRI commented 5 years ago

assigned to @LebedevRI