Closed RKSimon closed 8 months ago
@llvm/issue-subscribers-backend-x86
CC @elhewaty
assign me, please.
@RKSimon Is there any source I can use to understand DAG internals.
I'd start by seeing whats the difference between the IR being fed to DAG from masked_select vs masked_select_const - you will probably need to remove a lot of unnecessary bitcasts. Then step through the DAGCombine stages of running llc in a debugger - add breakpoints to the start of visitADD/visitSUB/visitVSELECT and see whats happening.
You can also use "llc --debug" (using a debug assertion build) to dump out everything llc has done: https://rust.godbolt.org/z/szYv5G8n9
Hello @RKSimon.
// select X, sub(X, C), m --> sub (X, and(C, m))
if (N1.getOpcode() == ISD::SUB && N1.getOperand(0) == N0 && N1.hasOneUse()) {
if (dyn_cast<ConstantSDNode>(N1.getOperand(1)))
return DAG.getNode(ISD::SUB, DL, N0.getValueType(), N0,
DAG.getNode(ISD::AND, DL, N2.getValueType(),
N1.getOperand(1), N2));
}
Here's what reached so far, I tried to match a pattern in visitSELECT
function.
is this logic correct?
Yes, that looks about right - you should use isConstantIntBuildVectorOrConstantInt
instead of dyn_cast<ConstantSDNode>
so it can match vector constant as well
Also, you need to sort out argument order (sorry when I reported this I was thinking _mm_blendv_epi8 order not select IR order)
@RKSimon, I used the following test case:
define <2 x i64> @masked_select_const(<2 x i64> %a, <2 x i64> %x, <2 x i64> %y) {
%bit_a = bitcast <2 x i64> %a to <4 x i32>
%sub.i = add <4 x i32> %bit_a, <i32 -24, i32 -24, i32 -24, i32 -24>
%bit_x = bitcast <2 x i64> %x to <4 x i32>
%bit_y = bitcast <2 x i64> %y to <4 x i32>
%cmp.i = icmp sgt <4 x i32> %bit_x, %bit_y
%sel = select <4 x i1> %cmp.i, <4 x i32> %sub.i, <4 x i32> %bit_a
%bit_sel = bitcast <4 x i32> %sel to <2 x i64>
ret <2 x i64> %bit_sel
}
The following code can't match the select
// select m, sub(X, C), X --> sub (X, and(C, m))
if (N1.getOpcode() == ISD::SUB && N1.getOperand(0) == N2 && N1->hasOneUse() &&
DAG.isConstantIntBuildVectorOrConstantInt(N1.getOperand(1))) {
return DAG.getNode(ISD::SUB, DL, N1.getValueType(), N2,
DAG.getNode(ISD::AND, DL, N0.getValueType(), N1.getOperand(1),
N0));
}
Any hint?
@RKSimon ping
Sorry I missed your ping.
In many cases DAG will try to fold (sub x, c) -> (add x, -c)
so you will need to do this in terms of ADD:
// select (sext m), (add X, C), X --> (add X, (and C, (sext m))))
if (N1.getOpcode() == ISD::ADD && N1.getOperand(0) == N2 && N1->hasOneUse() &&
DAG.isConstantIntBuildVectorOrConstantInt(N1.getOperand(1)) &&
N0.getScalarValueSizeInBits() == N1.getScalarValueSizeInBits()) {
return DAG.getNode(ISD::ADD, DL, N1.getValueType(), N2,
DAG.getNode(ISD::AND, DL, N0.getValueType(), N1.getOperand(1),
N0));
}
Note you need to ensure the N0 condition is the same width as the True/False operands otherwise you might affect targets with predicate mask types (AVX512 etc).
@elhewaty Do you have a PR (draft or active) anywhere with your work so far?
https://godbolt.org/z/a1PczEM8a
If we're selecting a subtraction with a non-constant we fold the select into an and:
But for constants this fails, which on x86 can result in a BLENDV instruction, which is never faster than an AND