Open llvmbot opened 12 years ago
mentioned in issue llvm/llvm-bugzilla-archive#40155
A related problem is when you do sth like this:
pre: %mask_pre = fcmp slt <4 x float> %a, %b
head: %phi = phi <4 x i1> [%mask_pre, %pre], [%mask_body, %body] ... br ..., %body, %out
body: %sel = select <4 x i1> %phi, ..., ... ... %mask_body = ...
out: ...
Now, LLVM will emit code like this for the phi node: pslld $31, %xmm9 psrad $31, %xmm9 This is completely superfluous.
This might be best-handled in CodeGenPrepare.
Given a type T, the type it is legalized to depends only on T. This is fundamental to the design of the type legalizer. You can't make the decision depend on what kind of operation produced this type. I hope that's not what you are suggesting!
Not at all. I don't plan to change it.
If I understand right, the <8 x i1> is being passed between basic blocks as <8 x i32>, while within the blocks
<8 x i16> is expected.
Actually, its the other way around. v8i1 is promoted to v8i16, while it should have been promoted to v8i32.
So... why is it being passed between blocks as an
<8 x i32>? Maybe the code that made that decision is simply old and not aware of how types are legalized nowadays.
We also have the same problem with this code:
define <4 x double> @foo(<4 x double> %x, <4 x double> %y) { %min_is_x = fcmp ult <4 x double> %x, %y %min = select <4 x i1> %min_is_x, <4 x double> %x, <4 x double> %y ret <4 x double> %min }
v4i1 is legalized to v4i32 while it should have been legalized to v4i64.
Given a type T, the type it is legalized to depends only on T. This is fundamental to the design of the type legalizer. You can't make the decision depend on what kind of operation produced this type. I hope that's not what you are suggesting! If I understand right, the <8 x i1> is being passed between basic blocks as <8 x i32>, while within the blocks
<8 x i16> is expected. So... why is it being passed between blocks as an <8 x i32>? Maybe the code that made that decision is simply old and not aware of how types are legalized nowadays.From what I can tell, this is a type-legalization problem. The type <8 x i1> is passed in a register (XMM!) between basic blocks. How should we legalize this type ? Well ... it depends. If this type is the result of a comparison of v8i16s, then it should be saved in a 128 bit register, but if the comparison is of v8i32s, then i256 should be used.
We can't solve this problem completely using the current design of the type-legalizer because it only looks at a basic block at a time. However, I think that we can make things better by changing some of the type-legalizer policies.
Currently vector elements are promoted until a wider type is found. In our case, v8i1 was widened to v8i16 which fit into an XMM register. Alternatively, we could have attempted to keep promoting it to v8i32, which would have been placed into a YMM register.
I will play around and try to see if promoting to the largest vector size is a better strategy.
Extended Description
The attached two test cases demonstrate an interesting situation that leads to much worse code than usual for vector computation on SSE and AVX (at least).
As context, when doing vector comparisons with those targets, it's usually important to immediately sext the result of the comparison to an value. (Which is what the instructions actually return; the x86 code generator picks up on this pattern and then emits just the desired vector comparison instruction.)
In the attached test case, the originally-generated code had the sexts right after the vector compares, but then a later optimization pass noticed that two values feeding into a phi node both had sext to after them, so it removed the two original sexts and added a new one after the phi node. As a result, the x86 code generator doesn't pick up the pattern and generates very inefficient code that first painfully does a zext conversion to and then later painfully converts this back to the originally desired sext value.
The two attachments are the same except for one has the sexts placed back after the vector compares and the other has the one with the single sext after the phi node. When run through llc -mattr=+avx, the second one has big sequences of the following as a result: