[LSV] Enhance LoadStoreVectorizer to Handle Disjoint Flag in OR Instructions and Restore Vectorization Opportunities

[LSV] Enhance SCEV to Handle Disjoint Flag in OR Instructions and Restore Vectorization Opportunities

Summary:

This patch enhances the LoadStoreVectorizer (LSV) pass to handle the disjoint flag for OR instructions at the Scalar Evolution (SCEV) analysis level. By doing so, we address a regression that caused a loss in vectorization opportunities when processing OR operations that are actually disjoint additive operations.

Background:

A previous change (PR#74467) introduced the disjoint flag as a representation change for OR instructions with no common bits set, as identified by haveNoCommonBitsSet(). This was intended to streamline the identification process and make it easier to revert such inferences. However, this led LSV to miss converting disjoint ORs into ADDs, causing a performance regression, particularly on AMD and NVIDIA architectures where such vectorizations are crucial.

Modifications:

The regression is addressed by enhancing the SCEV analysis to detect the disjoint flag on OR instructions and consider them as candidates for addition operation transformations. The change occurs within the MatchBinaryOp function, where we now set the disjoint flag on OR instructions after checking operands for common bits using haveNoCommonBitsSet(). By shifting the focus from LSV to SCEV, we better leverage LLVM's existing infrastructure for handling such special cases.

SCEV's role in analyzing and simplifying expressions throughout the compiler optimization phases makes it a more strategic point for detecting disjoint OR instructions.

SCEV's inherent design to model and optimize scalar expressions allows us to integrate or disjoint detection seamlessly. By integrating into SCEV’s MatchBinaryOp function, we empower LLVM to naturally apply this transformation during the expression matching phase. This allows the subsequent vectorization passes, such as LSV, to benefit from the already identified and transformed instructions, ensuring that vectorization opportunities involving or disjoint are not missed.

The detection logic checks for disjointness of OR operands similar to what was previously done, but now the flagged instructions become part of SCEV's canonical form. This not only fixes the regression introduced by the disjoint flag changes but also bolsters LLVM's ability to optimize such patterns consistently across different optimization passes.

Original IR:

%linear_index3 = or i32 %linear_index_plus_base.fr, 3 %linear_index2 = or i32 %linear_index_plus_base.fr, 2 %linear_index1 = or i32 %linear_index_plus_base.fr, 1

Transformed IR with disjoint flag:

%linear_index3 = or disjoint i32 %linear_index_plus_base.fr, 3 %linear_index2 = or disjoint i32 %linear_index_plus_base.fr, 2 %linear_index1 = or disjoint i32 %linear_index_plus_base.fr, 1

With the above transformation in place, LSV can identify these disjoint OR operations as additive, restoring its previously lost vectorization capabilities.

Test Plan:

A new phase-ordering test has been added to demonstrate that without this patch, OR instructions lacking the disjoint flag were not vectorized under -O1, -O2, and -O3 optimization levels. With the patch applied, SCEV successfully flags the OR instructions as disjoint, allowing LSV to vectorize them, as confirmed by the test.

Signed-off-by: lihao217 lihao217@kuaishou.com, Compiler Optimization Engineer at Kuaishou Technology

llvm / llvm-project