plaidml / tpp-mlir

TPP experimentation on MLIR for linear algebra
https://arxiv.org/abs/2404.15204
Other
111 stars 29 forks source link

Graviton 3 packing not working #891

Open rengolin opened 7 months ago

rengolin commented 7 months ago

Tests and benchmarks all work fine, except the ones using compiler packing (both FP32 and BF16).

Benchmark: prepacked_targets
gemm_fp32_dnn_target        :    79.273 gflops
gemm_bf16_dnn_target        :   256.180 gflops
mlp_fp32_dnn_target         :    78.956 gflops
mlp_bf16_dnn_target         :   254.930 gflops
gemm_fp32_mlir              :    78.429 gflops
gemm_bf16_dp4_mlir          :   253.889 gflops
mlp_fp32_mlir               :    78.576 gflops
mlp_bf16_dp4_mlir           :   250.948 gflops

Benchmark: gemm_models
fp32_3x1024_const_mlir      :     0.050 gflops
fp32_3x1024_args_mlir       :     0.002 gflops
bf16_3x1024_const_mlir      :     0.050 gflops
bf16_3x1024_args_mlir       :     0.002 gflops

Benchmark: mlp_models
fp32_3x1024_const_mlir      :     0.050 gflops
fp32_3x1024_args_mlir       :     0.002 gflops
bf16_3x1024_const_mlir      :     0.050 gflops
bf16_3x1024_args_mlir       :     0.002 gflops

Benchmark: torch_dynamo
gemm_fp32_torch             :     0.050 gflops
gemm_bf16_torch             :     0.050 gflops
mlp_fp32_torch              :     0.050 gflops
mlp_bf16_torch              :     0.050 gflops

This used to work circa early Jan, so it's something new. I won't have time to bisect until CGO, so I'll leave this here and just not report packing on Arm.

We need a Graviton builder, at least once a day. But we also need a benchmark that fails on certain conditions, which we don't have either. 😭