Use `-arith-emulate-unsupported-floats` in the pipeline

Use patterns from the -arith-emulate-unsupported-floats pass in the pipeline to transform:

%2 = arith.addf %0, %1 : tensor<512xbf16, #blocked>

into:

%ext0 = arith.extf %0 : tensor<512xbf16, #blocked> to tensor<512xf32, #blocked>
%ext1 = arith.extf %1 : tensor<512xbf16, #blocked> to tensor<512xf32, #blocked>
%ext2 = arith.addf %tmp0, %tmp1 : tensor<512xf32, #blocked>
%2 = arith.truncf %ext2 : tensor<512xf32, #blocked> to tensor<512xbf16, #blocked>

Currently, custom arith->llvm patterns are used for this. Using this would simplify our code and rely on upstream work.

intel / intel-xpu-backend-for-triton

Use `-arith-emulate-unsupported-floats` in the pipeline #1285

1506 modifies patterns to support unsupported operations for some sizes. Maybe we want to handle that similarly in this issue.