iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.58k stars 577 forks source link

[Codegen] Do not consider parallel regions in bufferization analysis #17757

Closed Max191 closed 2 months ago

Max191 commented 3 months ago

When there is a buffer used inside of an scf.forall op that is defined outside of the scf.forall, bufferization will unconditionally bufferize out of place by default in order to avoid race conditions. However, handling parallel accesses to a buffer should generally be the responsibility of the source program, and if there is a race condition, then it should be handled outside of bufferization. This PR disables the parallel region check in IREE to simplify the bufferization analysis and enable more buffer reuse.

It is possible that this PR could cause race conditions if data races are not handled properly, and we are relying too much on bufferization to be conservative. Turning this option off could be a good early step in diagnosing data races on GPU.

Max191 commented 3 months ago

After looking into the failing test I don't think we are ready for this flip yet. We need a better way of handling shared memory.

The test failure is cause by 2 tensor.empty() ops getting tiled to the same size and the CSEd into a single empty. However, the empty ops are sort of a hacky way to represent the shared memory buffers when running GPUPromoteMatmulOperandsPass. We need a better representation of the shared memory buffers at tensor level so we don't CSE them into a single buffer. The parallelism check in bufferization has been saving us by recreating a new buffer, but it is not a good way to handle this issue.

A couple of ideas would be to create alloc_tensor ops or implement some multi-buffering with both shared memory allocs being split across a single tensor.

CC @MaheshRavishankar @qedawkins @antiagainst

MaheshRavishankar commented 3 months ago

After looking into the failing test I don't think we are ready for this flip yet. We need a better way of handling shared memory.

The test failure is cause by 2 tensor.empty() ops getting tiled to the same size and the CSEd into a single empty. However, the empty ops are sort of a hacky way to represent the shared memory buffers when running GPUPromoteMatmulOperandsPass. We need a better representation of the shared memory buffers at tensor level so we don't CSE them into a single buffer. The parallelism check in bufferization has been saving us by recreating a new buffer, but it is not a good way to handle this issue.

A couple of ideas would be to create alloc_tensor ops or implement some multi-buffering with both shared memory allocs being split across a single tensor.

CC @MaheshRavishankar @qedawkins @antiagainst

whereever the tensor.empty is created we could create a bufferization.alloc_tensor (or whatever) op. IIUC this op does not CSE. This was explicitly the reason it was split out from tensor.empty

Max191 commented 2 months ago

https://github.com/iree-org/iree/pull/17940 fixes the test failure. Rebasing this PR on top of it for now.

github-actions[bot] commented 2 months ago

Abbreviated Benchmark Summary

@ commit e38fae5421272fe594b7f0e1c09fa09543b50836 (no previous benchmark results to compare)

Data-Tiling Comparison Table

Click to show | Name | No-DT (baseline) | DT-Only | DT-UK | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [788.727 (1.0X)](https://perf.iree.dev/serie?IREE?cbf78199705b14cb7332489c7415445f1b0e4189ec7a91b28232ff0e037fd7e2) | N/A | [221.751 (3.6X)](https://perf.iree.dev/serie?IREE?4c7b547fdbbf99b5d399e31f743a40249f0cd2aec451b9c0e6b2222fbf87bf4f) | | DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [6.973 (1.0X)](https://perf.iree.dev/serie?IREE?882a01b5adfe6cf932e3cacf39a21659d97c6d680a7c3aacbef5298958c13078) | N/A | [8.491 (0.8X)](https://perf.iree.dev/serie?IREE?60ebe003ad32386572a7515583e00883b11209d13c62d6907be645492557aa71) | | EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [36.029 (1.0X)](https://perf.iree.dev/serie?IREE?e7eb7934128cdfa74ffd4b1a5435fb595b313cfb7057fd458caccf04037346ac) | N/A | [34.606 (1.0X)](https://perf.iree.dev/serie?IREE?d38b4a4e1e86311faf6d3a7dcd6a8b8ce8ec305456e4a79e599104dd31e97909) | | EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [5.816 (1.0X)](https://perf.iree.dev/serie?IREE?e94f7cad9035a9a3f3f6dc8ca0fb4ecc25339cf0f4a153c842b95ec00dc66f7f) | N/A | [5.033 (1.2X)](https://perf.iree.dev/serie?IREE?579b8550840595f0dc5a89acbb574ebf022c1581132b82e56139df142953c820) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [9.144 (1.0X)](https://perf.iree.dev/serie?IREE?b04574805bfe322d9ce4e3c40a974d1429196fb3d08ede92ba8f45a74c81a773) | N/A | [8.496 (1.1X)](https://perf.iree.dev/serie?IREE?9c569e155e55577bd706c41591db729c6ee388ecd7a466a21d3716dde38575a9) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [11.037 (1.0X)](https://perf.iree.dev/serie?IREE?230baee287330f520a0576d6bcdd8df7a714059bdab8d1308b6655269aea2e13) | N/A | [9.005 (1.2X)](https://perf.iree.dev/serie?IREE?dd29ae6a7fad89ad7309c00d1b60ed6314eccd88402f49433e41f857d415a428) | | MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [11.998 (1.0X)](https://perf.iree.dev/serie?IREE?254aa396e6ccfbf529973e678cf3d88722dacec4e44b58aa1fcc65993e875f0d) | N/A | [13.714 (0.9X)](https://perf.iree.dev/serie?IREE?52c4c346a22d0b8ff2fec9701d9bb1aa75423140f5db580ad6da29213aba0d59) | | MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [33.706 (1.0X)](https://perf.iree.dev/serie?IREE?2ef61bde12ad45388014562af6d14a98a83069ff322eaf91293186d8d5ea4bb9) | N/A | [61.553 (0.5X)](https://perf.iree.dev/serie?IREE?6c820fd574f08948bddbf14fa5075d1dce2a0191d677cbfedf58fa6b9ddbf9a3) | | MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [34.471 (1.0X)](https://perf.iree.dev/serie?IREE?6037970c8a3f46a533e6d0c2db581a2cda6d827709bb23562085b36cf30d5921) | N/A | [61.947 (0.6X)](https://perf.iree.dev/serie?IREE?d7d25a8c838db8d5859a25187d8fedc23de97e0280b1a85e12e9348f411c0c8e) | | MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [68.923 (1.0X)](https://perf.iree.dev/serie?IREE?a2bd1c8e875ac8dcd218641e73102249a16c011c38d3775d52d9dd8a9ba324f4) | N/A | [65.935 (1.0X)](https://perf.iree.dev/serie?IREE?6c3eebd478ce05568e03b90fffbaabf0ae95774046d9f492ee53b8e34a6b692d) | | MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [4.724 (1.0X)](https://perf.iree.dev/serie?IREE?002cd64f66606ef48d9568103412f709d494fbea040a6879b069436ccc106733) | N/A | [4.584 (1.0X)](https://perf.iree.dev/serie?IREE?14e8174454310c9b24812dca661319c7b8e78a1175003f56abe8cfa7e7bb9cb9) | | MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [3.748 (1.0X)](https://perf.iree.dev/serie?IREE?9accf20747a0a52c6c6b7da7433c9e9cdf68a813ec6589b781ecb7791a836e34) | N/A | [4.873 (0.8X)](https://perf.iree.dev/serie?IREE?ce780c2ab7c9b837611b5e1dcdbce18e7563fb9d9137e68b5a50bd917a54f83d) | | MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [5.850 (1.0X)](https://perf.iree.dev/serie?IREE?c196cfd95d87ddeb4cb008e055ec417dd805617dd204295c17856ca0f9e0863c) | N/A | [5.392 (1.1X)](https://perf.iree.dev/serie?IREE?5b41fd88f5fa3c217d024908b57237037d8851b0cba869fb142270cb2fd17ff1) | | MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [2.940 (1.0X)](https://perf.iree.dev/serie?IREE?069f6917e401e63c9e50c548c70cc699385e6f6908517eb6c79c96e597bf96d7) | N/A | [2.814 (1.0X)](https://perf.iree.dev/serie?IREE?c27738e97498c969076d1a2a693322821dd104dbcf7ba6e129ba893584bb0dfd) | | MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [8.446 (1.0X)](https://perf.iree.dev/serie?IREE?4af168ed94d96166f35b8264e160ca1e85a3c6ef3faa08284f447a5613f6ce39) | N/A | [9.832 (0.9X)](https://perf.iree.dev/serie?IREE?ec20addfc5f284c92b739d0eaf245af0027627de593635539a86709332ae5acf) | | PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [0.777 (1.0X)](https://perf.iree.dev/serie?IREE?77dd6dcff77b2053dbc4cbafc7ca36f8ee5aabdc138b5808830908b037014cc3) | N/A | [0.610 (1.3X)](https://perf.iree.dev/serie?IREE?8d8fd2fbd7901ece93ffa5e47c460dd793c4489b5751a15bb0c3e1b8d82073db) | | PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [4.152 (1.0X)](https://perf.iree.dev/serie?IREE?a2ebf5883d38f358868199609143debdbb2947b6e0ab6c5b03802cb813022f9f) | N/A | [5.182 (0.8X)](https://perf.iree.dev/serie?IREE?1e0197113e1bab228898b4e76067c7c8dcd0faf2b0cf5af9dbb227491de894e4) | | matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [7.581 (1.0X)](https://perf.iree.dev/serie?IREE?641b82f32c47ecd4d02c8c82926118acfce0f530e8728e04a1d593a2876847d2) | N/A | [7.573 (1.0X)](https://perf.iree.dev/serie?IREE?c3a0b8c64c6406c9e4a46d537f2acd4ed2b9f6c191387830c5fcb215cd91d9d0) | | matmul\_256x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [8.377 (1.0X)](https://perf.iree.dev/serie?IREE?cae1371c12e17c10858e572a14d6d17ecbbf844fd437affd4d679e651a323e54) | N/A | [1.807 (4.6X)](https://perf.iree.dev/serie?IREE?2ece87896c8e494c3a35b681c746c566d57bd4f6d996ae1449c7d652e31664bb) | | BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [218.638 (1.0X)](https://perf.iree.dev/serie?IREE?cb3631222b94571a286e32c3aa1e56c021aba1b7f3d82ffd2400ea07d9dfcc3f) | N/A | [108.259 (2.0X)](https://perf.iree.dev/serie?IREE?9b3354efe105e56bf9f9ae18ede492ce77a01e5cfcec56eed66b21795d8d8944) | | DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [32.447 (1.0X)](https://perf.iree.dev/serie?IREE?015af8c7c74743569726f8fecf3c5af66eb516b1e4c27b9c53444e5eb68254f9) | N/A | [29.835 (1.1X)](https://perf.iree.dev/serie?IREE?7237c7cbf5353280472161050ccb803bd6237ac656eab0604d5cc610d73ef778) | | EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [275.887 (1.0X)](https://perf.iree.dev/serie?IREE?d14bc72f848279de26aba8bd86bb530767acc4ca769356ab548258db49c44555) | N/A | [229.400 (1.2X)](https://perf.iree.dev/serie?IREE?ce7eec0c36a5fda73313a06da87ff315e0307cd6d2962d167e7e641eea50604c) | | EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [26.974 (1.0X)](https://perf.iree.dev/serie?IREE?480a2fe9ab9bd9ade098ff3c5fa0fd61a93c787c99329a1cdcecac6e5d708558) | N/A | [13.031 (2.1X)](https://perf.iree.dev/serie?IREE?423824abc1ed6574ed1315b6c6432366edefbec9704c4b524d6daa9c7f18bf0a) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [70.679 (1.0X)](https://perf.iree.dev/serie?IREE?0be99f368751e55d1ce96e0d44819c3ba3a69c12c040048a67344f516f69873e) | N/A | [37.560 (1.9X)](https://perf.iree.dev/serie?IREE?ce26c2ff64d5511aea1d19f13a17363995cdcf8c88d01097da455525abaf9efe) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [88.576 (1.0X)](https://perf.iree.dev/serie?IREE?212726872c6a041363a7346217805fde6a21e1953d006a279cb748ca865a95aa) | N/A | [39.623 (2.2X)](https://perf.iree.dev/serie?IREE?b56af01b0b5512c28b180552134e3e2701a068586e8a1a08bb307e0a1e42d656) | | MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [80.567 (1.0X)](https://perf.iree.dev/serie?IREE?e076babcf92c08d76f05c53bec9bcf823f3855b6280c2c74465ed25bb2bb2bd7) | N/A | [55.983 (1.4X)](https://perf.iree.dev/serie?IREE?3da49d74eed3cd740c69a6a2a97f3ff7e54710ea66c083670042256b2648ddcf) | | MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [182.101 (1.0X)](https://perf.iree.dev/serie?IREE?746443fef718b98d7449c0b2d1733195479afa32e50ae726e8f695cc48611f57) | N/A | [186.203 (1.0X)](https://perf.iree.dev/serie?IREE?b528e469bfd43258750e70a724bf02eeb157173782b5a5a8912ae036e3ffce58) | | MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [181.315 (1.0X)](https://perf.iree.dev/serie?IREE?51473638a07429e21bf4b4fdfdb47201bbdff46edc0134cab2d589abc65a4ed6) | N/A | [191.411 (0.9X)](https://perf.iree.dev/serie?IREE?4d92c9901b7c73d8e02e63adfdcdf63ef0fb529360a908f93b888dee1c3f9c31) | | MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [516.233 (1.0X)](https://perf.iree.dev/serie?IREE?5b81ba0c3d0db49f11e4c7e51f4138a723c72445c4d1b7d6d441d5a02bbf700a) | N/A | [240.753 (2.1X)](https://perf.iree.dev/serie?IREE?7001a4f2a5e52aa034f802096f625e278fc10b92cd85653335c3a7c5110492c7) | | MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [25.036 (1.0X)](https://perf.iree.dev/serie?IREE?1622e274d5ac570e18826aaec62f223c538583eb2f76e771d24eb2f7785954aa) | N/A | [17.616 (1.4X)](https://perf.iree.dev/serie?IREE?6600e5c77f343f3727788ac55712340db67660453f0d5b2a78f8a2f00bffa9f2) | | MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [11.681 (1.0X)](https://perf.iree.dev/serie?IREE?48cac7cf7dea690dd7d8e8669fd5d6f65d1f20c0de1710dc381cf15533354bed) | N/A | [11.369 (1.0X)](https://perf.iree.dev/serie?IREE?6272e089c33b7c5333b6188b6f61fbb15e7b6a0e9fcd9d54b3b7271cd730e0da) | | MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [21.497 (1.0X)](https://perf.iree.dev/serie?IREE?23e7ffd476616a14cc5b0cabe27332ff71fec9cdc22801b675f8e6349c498814) | N/A | [11.784 (1.8X)](https://perf.iree.dev/serie?IREE?10f2428bc7da79d6d0f23d87caa4cb20ba55d968736b64c6a47c3041be10f641) | | MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [2.791 (1.0X)](https://perf.iree.dev/serie?IREE?fd46a78e4032c5fa09644bcda90d0d8b73e9196fb89e2458db2838ddf5fd4c16) | N/A | [2.702 (1.0X)](https://perf.iree.dev/serie?IREE?485da7a706b6c0940ef45626ec12ab149da295cc6a3c0a2c63e5a15a952580b4) | | MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [33.799 (1.0X)](https://perf.iree.dev/serie?IREE?0aac8a2a5c45ed0ed35dcd65338a5a414c6beefcdbb0fbb4f299b42d41b639e1) | N/A | [30.969 (1.1X)](https://perf.iree.dev/serie?IREE?d6bfea70085e57a372f18983ddd9f7598b084dc4aac07754c80e4f4f5c4fb407) | | PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [0.709 (1.0X)](https://perf.iree.dev/serie?IREE?da589d3a658ddcc4dacaab64c8c7253bab3b0b90fbd35158ba58ed883266d5dc) | N/A | [0.548 (1.3X)](https://perf.iree.dev/serie?IREE?3283ddd7c21e5db8eea573c2f94ae318c5baa6bf3d9340ba157573937e7b6632) | | PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [17.478 (1.0X)](https://perf.iree.dev/serie?IREE?0d4e114d66ae2e078076cc40fca5e6af76232c3936effb92d33e23f76f26ede8) | N/A | [19.371 (0.9X)](https://perf.iree.dev/serie?IREE?51181aae886260ff3c24d829e8bf9e3a892aa93305321c1012476aace79f9e65) | | matmul\_1x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [0.054 (1.0X)](https://perf.iree.dev/serie?IREE?6d2c0a06c3eaf69a9c711482f9c6648cf0a790b4b7705f35f9ef9582b9aacbde) | N/A | [0.054 (1.0X)](https://perf.iree.dev/serie?IREE?db2de6271f515862572fe26fb58b0e29fa494eb5f30eed6238cceced14581407) | | matmul\_1x256x2048\_i8\_i8\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [0.042 (1.0X)](https://perf.iree.dev/serie?IREE?e7b5bed0962d3dd85ed42a2a3b08b15319335cfb2d1337c4d86bf9a86889221b) | N/A | [0.021 (2.0X)](https://perf.iree.dev/serie?IREE?795fa468ffab2f4477bc819014b5668fc4f21a01f89bcd90593d0e594cea350e) |

Raw Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 221.751 222.448 1.726
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 788.727 780.481 36.115
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.491 8.492 0.018

[Top 3 out of 92 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run