Cleans up and simplifies GPU pipeline in preparation for unified vector based lowering.
The main goal is to retire old experimental paths and prepare for more common vendor-agnostic lowering infrastructure.
It is another step toward GPU codegen through vectorization.
Summary of changes:
moves to tiling based kernel outlining - retires naive outlining based on Linalg to parallel loops conversion
retires packed GEMM GPU kernels - currently irrelevant for GPU kernel creation
retires custom Linalg to WMMA lowering - to be replaced with generic vectorization scheme in the future
cleanups tests and adjusts existing ones to pipeline changes
allows to override default GPU tiling sizes and to use tile setting provided by DLTI (for now uses CPU tile size)
Cleans up and simplifies GPU pipeline in preparation for unified vector based lowering.
The main goal is to retire old experimental paths and prepare for more common vendor-agnostic lowering infrastructure. It is another step toward GPU codegen through vectorization.
Summary of changes: