I think we found a correctness issue in the loop vectorizer related to the incorrect propagation of the ‘nuw’ flag for instructions guarded by a condition. The following pseudo-code illustrates the problem:
for (i = 0; i<4; ++i) {
if(i > 0) {
… = a[i-1]
}
}
In this example, the computation of i-1 could be flagged with ‘nuw’ because the guarding condition i > 0 guarantees that the unsigned result will always be within the range [0 - 2] and, therefore, will never wrap around. However, when the loop is vectorized and predication flattens the control flow, the previous property is no longer true:
for (i = 0; i<4; i+=4) {
mask = {1, 1, 1, 0};
… = masked_load(&a[i-1], mask);
}
In this case, the computation of i-1 is no longer guarded by the original condition and the unsigned result will wrap-around for i=0. The vectorizer currently preserves the original ‘nuw’ value for i-1, which leads to a poison value that is later optimized away, resulting in an incorrect vector code.
Please, find attached a test case. The problematic code before vectorization is:
concat_index_from_operand_id1: ; preds = %concatenate.pivot.1.4
%20 = add nsw i64 %fusion.indvar.dim.32, -1
%21 = mul nuw nsw i64 %fusion.indvar.dim.24, 3
%22 = add nuw nsw i64 %20, %21
%23 = getelementptr inbounds [1 x [6 x float]], [1 x [6 x float]]* %11, i64 0, i64 0, i64 %22
vector.body: ; preds = %vector.ph
%14 = mul nuw nsw i64 %fusion.indvar.dim.24, 3
%15 = add nuw nsw i64 -1, %14 // nuw & -1 -> Poison!
%16 = getelementptr inbounds [1 x [6 x float]], [1 x [6 x float]]* %11, i64 0, i64 0, i64 %15
I haven’t thought too much about the solution. Maybe we could drop the ‘nuw’/’nsw’ flags of uniform operations that are not immediately nested in the loop body. Perhaps, during predication?
Extended Description
Hello,
I think we found a correctness issue in the loop vectorizer related to the incorrect propagation of the ‘nuw’ flag for instructions guarded by a condition. The following pseudo-code illustrates the problem:
for (i = 0; i<4; ++i) { if(i > 0) { … = a[i-1] } }
In this example, the computation of
i-1
could be flagged with ‘nuw’ because the guarding conditioni > 0
guarantees that the unsigned result will always be within the range [0 - 2] and, therefore, will never wrap around. However, when the loop is vectorized and predication flattens the control flow, the previous property is no longer true:for (i = 0; i<4; i+=4) { mask = {1, 1, 1, 0}; … = masked_load(&a[i-1], mask); }
In this case, the computation of
i-1
is no longer guarded by the original condition and the unsigned result will wrap-around for i=0. The vectorizer currently preserves the original ‘nuw’ value fori-1
, which leads to a poison value that is later optimized away, resulting in an incorrect vector code.Please, find attached a test case. The problematic code before vectorization is:
concat_index_from_operand_id1: ; preds = %concatenate.pivot.1.4 %20 = add nsw i64 %fusion.indvar.dim.32, -1 %21 = mul nuw nsw i64 %fusion.indvar.dim.24, 3 %22 = add nuw nsw i64 %20, %21 %23 = getelementptr inbounds [1 x [6 x float]], [1 x [6 x float]]* %11, i64 0, i64 0, i64 %22
After vectorization (opt before_lv.ll -scoped-noalias-aa -loop-vectorize -ipsccp -mcpu=skx -S -o after_lv.ll):
vector.body: ; preds = %vector.ph %14 = mul nuw nsw i64 %fusion.indvar.dim.24, 3 %15 = add nuw nsw i64 -1, %14 // nuw & -1 -> Poison! %16 = getelementptr inbounds [1 x [6 x float]], [1 x [6 x float]]* %11, i64 0, i64 0, i64 %15
I haven’t thought too much about the solution. Maybe we could drop the ‘nuw’/’nsw’ flags of uniform operations that are not immediately nested in the loop body. Perhaps, during predication?
Thanks, Diego