`memref.alloc()` on dynamic shape tensor cannot be successfully lowered

Encountered in the following matmul config

module attributes {dlti.target_system_spec = #dlti.target_system_spec<"CPU" : #dlti.target_device_spec<#dlti.dl_entry<"L1_cache_size_in_bytes", 49152 : ui32>, #dlti.dl_entry<"L2_cache_size_in_bytes", 2097152 : ui64>, #dlti.dl_entry<"L3_cache_size_in_bytes", 110100480 : ui64>, #dlti.dl_entry<"num_threads", 56 : i32>, #dlti.dl_entry<"max_vector_width", 512 : i64>>>} {
  func.func @entry(%arg0: tensor<128x512xbf16>, %arg1: tensor<512x1024xbf16>) -> tensor<128x1024xbf16> attributes {llvm.emit_c_interface} {
    %cst = arith.constant 0.000000e+00 : bf16
    %0 = tensor.empty() : tensor<128x1024xbf16>
    %1 = linalg.fill ins(%cst : bf16) outs(%0 : tensor<128x1024xbf16>) -> tensor<128x1024xbf16>
    %2 = linalg.matmul {KBlock = 32 : i32, KThreads = 1 : i32, MBlock = 32 : i32, MThreads = 4 : i32, NBlock = 128 : i32, NThreads = 14 : i32, cast = #linalg.type_fn<cast_signed>, innermostKBlock = 32 : i32, innermostMBlock = 32 : i32, innermostNBlock = 32 : i32} ins(%arg0, %arg1 : tensor<128x512xbf16>, tensor<512x1024xbf16>) outs(%1 : tensor<128x1024xbf16>) -> tensor<128x1024xbf16>
    return %2 : tensor<128x1024xbf16>
  }
}

After one-shot bufferization, we encounter the following

%alloc_3 = memref.alloc(%6) {alignment = 64 : i64} : memref<32x?xf32>

which further being lowered to un-eliminable builtin.unrealized_conversion_cast.

intel / graph-compiler

`memref.alloc()` on dynamic shape tensor cannot be successfully lowered #377