iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.47k stars 548 forks source link

[Flow] Always permute the accesses on inputs for elementwise consumer from namedop/reduction producer. #17663

Open MaheshRavishankar opened 1 week ago

MaheshRavishankar commented 1 week ago

For dispatch formation, the current logic (and a lot of code-generation) works much better if the consumer uses an identity indexing map for the producer. There is already a pass in dispatch region formation flow that does this for just a convolution op. Make this apply for more general cases.

github-actions[bot] commented 1 week ago

Abbreviated Benchmark Summary

@ commit 50035d59537fa95e5b3d8b194b6c0883b87f1395 (vs. base 7b58c712a1c6bc1a13fc4525ef07b0030a950d86)

Data-Tiling Comparison Table

Click to show | Name | No-DT (baseline) | DT-Only | DT-UK | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------- | | BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [225.750 (1.0X)](https://perf.iree.dev/serie?IREE?cb3631222b94571a286e32c3aa1e56c021aba1b7f3d82ffd2400ea07d9dfcc3f) | N/A | [107.875 (2.1X)](https://perf.iree.dev/serie?IREE?9b3354efe105e56bf9f9ae18ede492ce77a01e5cfcec56eed66b21795d8d8944) | | BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [782.836 (1.0X)](https://perf.iree.dev/serie?IREE?cbf78199705b14cb7332489c7415445f1b0e4189ec7a91b28232ff0e037fd7e2) | N/A | [221.953 (3.5X)](https://perf.iree.dev/serie?IREE?4c7b547fdbbf99b5d399e31f743a40249f0cd2aec451b9c0e6b2222fbf87bf4f) | | DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [6.984 (1.0X)](https://perf.iree.dev/serie?IREE?882a01b5adfe6cf932e3cacf39a21659d97c6d680a7c3aacbef5298958c13078) | N/A | [8.477 (0.8X)](https://perf.iree.dev/serie?IREE?60ebe003ad32386572a7515583e00883b11209d13c62d6907be645492557aa71) | | DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [32.223 (1.0X)](https://perf.iree.dev/serie?IREE?015af8c7c74743569726f8fecf3c5af66eb516b1e4c27b9c53444e5eb68254f9) | N/A | [29.835 (1.1X)](https://perf.iree.dev/serie?IREE?7237c7cbf5353280472161050ccb803bd6237ac656eab0604d5cc610d73ef778) | | EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [35.851 (1.0X)](https://perf.iree.dev/serie?IREE?e7eb7934128cdfa74ffd4b1a5435fb595b313cfb7057fd458caccf04037346ac) | N/A | [34.142 (1.1X)](https://perf.iree.dev/serie?IREE?d38b4a4e1e86311faf6d3a7dcd6a8b8ce8ec305456e4a79e599104dd31e97909) | | EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [274.244 (1.0X)](https://perf.iree.dev/serie?IREE?d14bc72f848279de26aba8bd86bb530767acc4ca769356ab548258db49c44555) | N/A | [228.984 (1.2X)](https://perf.iree.dev/serie?IREE?ce7eec0c36a5fda73313a06da87ff315e0307cd6d2962d167e7e641eea50604c) | | EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [5.793 (1.0X)](https://perf.iree.dev/serie?IREE?e94f7cad9035a9a3f3f6dc8ca0fb4ecc25339cf0f4a153c842b95ec00dc66f7f) | N/A | [4.967 (1.2X)](https://perf.iree.dev/serie?IREE?579b8550840595f0dc5a89acbb574ebf022c1581132b82e56139df142953c820) | | EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [26.886 (1.0X)](https://perf.iree.dev/serie?IREE?480a2fe9ab9bd9ade098ff3c5fa0fd61a93c787c99329a1cdcecac6e5d708558) | N/A | [13.003 (2.1X)](https://perf.iree.dev/serie?IREE?423824abc1ed6574ed1315b6c6432366edefbec9704c4b524d6daa9c7f18bf0a) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [9.206 (1.0X)](https://perf.iree.dev/serie?IREE?b04574805bfe322d9ce4e3c40a974d1429196fb3d08ede92ba8f45a74c81a773) | N/A | [8.450 (1.1X)](https://perf.iree.dev/serie?IREE?9c569e155e55577bd706c41591db729c6ee388ecd7a466a21d3716dde38575a9) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [69.809 (1.0X)](https://perf.iree.dev/serie?IREE?0be99f368751e55d1ce96e0d44819c3ba3a69c12c040048a67344f516f69873e) | N/A | [39.597 (1.8X)](https://perf.iree.dev/serie?IREE?ce26c2ff64d5511aea1d19f13a17363995cdcf8c88d01097da455525abaf9efe) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [10.964 (1.0X)](https://perf.iree.dev/serie?IREE?230baee287330f520a0576d6bcdd8df7a714059bdab8d1308b6655269aea2e13) | N/A | [8.860 (1.2X)](https://perf.iree.dev/serie?IREE?dd29ae6a7fad89ad7309c00d1b60ed6314eccd88402f49433e41f857d415a428) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [88.643 (1.0X)](https://perf.iree.dev/serie?IREE?212726872c6a041363a7346217805fde6a21e1953d006a279cb748ca865a95aa) | N/A | [41.739 (2.1X)](https://perf.iree.dev/serie?IREE?b56af01b0b5512c28b180552134e3e2701a068586e8a1a08bb307e0a1e42d656) | | MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [11.895 (1.0X)](https://perf.iree.dev/serie?IREE?254aa396e6ccfbf529973e678cf3d88722dacec4e44b58aa1fcc65993e875f0d) | N/A | [13.683 (0.9X)](https://perf.iree.dev/serie?IREE?52c4c346a22d0b8ff2fec9701d9bb1aa75423140f5db580ad6da29213aba0d59) | | MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [80.442 (1.0X)](https://perf.iree.dev/serie?IREE?e076babcf92c08d76f05c53bec9bcf823f3855b6280c2c74465ed25bb2bb2bd7) | N/A | [56.734 (1.4X)](https://perf.iree.dev/serie?IREE?3da49d74eed3cd740c69a6a2a97f3ff7e54710ea66c083670042256b2648ddcf) | | MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [33.643 (1.0X)](https://perf.iree.dev/serie?IREE?2ef61bde12ad45388014562af6d14a98a83069ff322eaf91293186d8d5ea4bb9) | N/A | [61.338 (0.5X)](https://perf.iree.dev/serie?IREE?6c820fd574f08948bddbf14fa5075d1dce2a0191d677cbfedf58fa6b9ddbf9a3) | | MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [180.527 (1.0X)](https://perf.iree.dev/serie?IREE?746443fef718b98d7449c0b2d1733195479afa32e50ae726e8f695cc48611f57) | N/A | [184.860 (1.0X)](https://perf.iree.dev/serie?IREE?b528e469bfd43258750e70a724bf02eeb157173782b5a5a8912ae036e3ffce58) | | MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [34.622 (1.0X)](https://perf.iree.dev/serie?IREE?6037970c8a3f46a533e6d0c2db581a2cda6d827709bb23562085b36cf30d5921) | N/A | [62.581 (0.6X)](https://perf.iree.dev/serie?IREE?d7d25a8c838db8d5859a25187d8fedc23de97e0280b1a85e12e9348f411c0c8e) | | MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [182.125 (1.0X)](https://perf.iree.dev/serie?IREE?51473638a07429e21bf4b4fdfdb47201bbdff46edc0134cab2d589abc65a4ed6) | N/A | [189.360 (1.0X)](https://perf.iree.dev/serie?IREE?4d92c9901b7c73d8e02e63adfdcdf63ef0fb529360a908f93b888dee1c3f9c31) | | MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [66.361 (1.0X)](https://perf.iree.dev/serie?IREE?a2bd1c8e875ac8dcd218641e73102249a16c011c38d3775d52d9dd8a9ba324f4) | N/A | [61.994 (1.1X)](https://perf.iree.dev/serie?IREE?6c3eebd478ce05568e03b90fffbaabf0ae95774046d9f492ee53b8e34a6b692d) | | MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [490.050 (1.0X)](https://perf.iree.dev/serie?IREE?5b81ba0c3d0db49f11e4c7e51f4138a723c72445c4d1b7d6d441d5a02bbf700a) | N/A | [213.663 (2.3X)](https://perf.iree.dev/serie?IREE?7001a4f2a5e52aa034f802096f625e278fc10b92cd85653335c3a7c5110492c7) | | MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [4.694 (1.0X)](https://perf.iree.dev/serie?IREE?002cd64f66606ef48d9568103412f709d494fbea040a6879b069436ccc106733) | N/A | [4.511 (1.0X)](https://perf.iree.dev/serie?IREE?14e8174454310c9b24812dca661319c7b8e78a1175003f56abe8cfa7e7bb9cb9) | | MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [27.362 (1.0X)](https://perf.iree.dev/serie?IREE?1622e274d5ac570e18826aaec62f223c538583eb2f76e771d24eb2f7785954aa) | N/A | [17.695 (1.5X)](https://perf.iree.dev/serie?IREE?6600e5c77f343f3727788ac55712340db67660453f0d5b2a78f8a2f00bffa9f2) | | MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [3.718 (1.0X)](https://perf.iree.dev/serie?IREE?9accf20747a0a52c6c6b7da7433c9e9cdf68a813ec6589b781ecb7791a836e34) | N/A | [4.884 (0.8X)](https://perf.iree.dev/serie?IREE?ce780c2ab7c9b837611b5e1dcdbce18e7563fb9d9137e68b5a50bd917a54f83d) | | MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [12.203 (1.0X)](https://perf.iree.dev/serie?IREE?48cac7cf7dea690dd7d8e8669fd5d6f65d1f20c0de1710dc381cf15533354bed) | N/A | [11.382 (1.1X)](https://perf.iree.dev/serie?IREE?6272e089c33b7c5333b6188b6f61fbb15e7b6a0e9fcd9d54b3b7271cd730e0da) | | MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [5.886 (1.0X)](https://perf.iree.dev/serie?IREE?c196cfd95d87ddeb4cb008e055ec417dd805617dd204295c17856ca0f9e0863c) | N/A | [5.348 (1.1X)](https://perf.iree.dev/serie?IREE?5b41fd88f5fa3c217d024908b57237037d8851b0cba869fb142270cb2fd17ff1) | | MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [21.712 (1.0X)](https://perf.iree.dev/serie?IREE?23e7ffd476616a14cc5b0cabe27332ff71fec9cdc22801b675f8e6349c498814) | N/A | [11.804 (1.8X)](https://perf.iree.dev/serie?IREE?10f2428bc7da79d6d0f23d87caa4cb20ba55d968736b64c6a47c3041be10f641) | | MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [3.029 (1.0X)](https://perf.iree.dev/serie?IREE?069f6917e401e63c9e50c548c70cc699385e6f6908517eb6c79c96e597bf96d7) | N/A | [2.816 (1.1X)](https://perf.iree.dev/serie?IREE?c27738e97498c969076d1a2a693322821dd104dbcf7ba6e129ba893584bb0dfd) | | MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [2.780 (1.0X)](https://perf.iree.dev/serie?IREE?fd46a78e4032c5fa09644bcda90d0d8b73e9196fb89e2458db2838ddf5fd4c16) | N/A | [2.611 (1.1X)](https://perf.iree.dev/serie?IREE?485da7a706b6c0940ef45626ec12ab149da295cc6a3c0a2c63e5a15a952580b4) | | MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [8.464 (1.0X)](https://perf.iree.dev/serie?IREE?4af168ed94d96166f35b8264e160ca1e85a3c6ef3faa08284f447a5613f6ce39) | N/A | [9.812 (0.9X)](https://perf.iree.dev/serie?IREE?ec20addfc5f284c92b739d0eaf245af0027627de593635539a86709332ae5acf) | | MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [35.260 (1.0X)](https://perf.iree.dev/serie?IREE?0aac8a2a5c45ed0ed35dcd65338a5a414c6beefcdbb0fbb4f299b42d41b639e1) | N/A | [31.116 (1.1X)](https://perf.iree.dev/serie?IREE?d6bfea70085e57a372f18983ddd9f7598b084dc4aac07754c80e4f4f5c4fb407) | | PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [0.767 (1.0X)](https://perf.iree.dev/serie?IREE?77dd6dcff77b2053dbc4cbafc7ca36f8ee5aabdc138b5808830908b037014cc3) | N/A | [0.632 (1.2X)](https://perf.iree.dev/serie?IREE?8d8fd2fbd7901ece93ffa5e47c460dd793c4489b5751a15bb0c3e1b8d82073db) | | PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [0.702 (1.0X)](https://perf.iree.dev/serie?IREE?da589d3a658ddcc4dacaab64c8c7253bab3b0b90fbd35158ba58ed883266d5dc) | N/A | [0.569 (1.2X)](https://perf.iree.dev/serie?IREE?3283ddd7c21e5db8eea573c2f94ae318c5baa6bf3d9340ba157573937e7b6632) | | PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [4.236 (1.0X)](https://perf.iree.dev/serie?IREE?a2ebf5883d38f358868199609143debdbb2947b6e0ab6c5b03802cb813022f9f) | N/A | [5.155 (0.8X)](https://perf.iree.dev/serie?IREE?1e0197113e1bab228898b4e76067c7c8dcd0faf2b0cf5af9dbb227491de894e4) | | PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | [17.792 (1.0X)](https://perf.iree.dev/serie?IREE?0d4e114d66ae2e078076cc40fca5e6af76232c3936effb92d33e23f76f26ede8) | N/A | [18.841 (0.9X)](https://perf.iree.dev/serie?IREE?51181aae886260ff3c24d829e8bf9e3a892aa93305321c1012476aace79f9e65) | | matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | [7.569 (1.0X)](https://perf.iree.dev/serie?IREE?641b82f32c47ecd4d02c8c82926118acfce0f530e8728e04a1d593a2876847d2) | N/A | [7.583 (1.0X)](https://perf.iree.dev/serie?IREE?c3a0b8c64c6406c9e4a46d537f2acd4ed2b9f6c191387830c5fcb215cd91d9d0) | | DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [48.955 (1.0X)](https://perf.iree.dev/serie?IREE?95281c38b844a3b0ea1964e9634e7a8e2b40025936e3402ff2902be01dbd31b7) | N/A | [43.632 (1.1X)](https://perf.iree.dev/serie?IREE?f17944b7339d0d84be14cd71d31c10b495df98114d5af917259df75540551fa4) | | DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [50.278 (1.0X)](https://perf.iree.dev/serie?IREE?4cc57db28e42e4b50f3d234a99faee5e7d48ac787d70f106ed2260e4160f27fc) | N/A | [43.870 (1.1X)](https://perf.iree.dev/serie?IREE?d44c3fbc39f410214516a4c591f879e0ac9454b33a970ff63953fc00f2ec465b) | | DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [30.112 (1.0X)](https://perf.iree.dev/serie?IREE?ed4f76526e499d8e959237456899cc74fa4bab29674b0ba083c5ce38edc61fab) | N/A | [27.492 (1.1X)](https://perf.iree.dev/serie?IREE?5343c96ad4bb05804680ca8a51d26bc1ffc4e1d16348e923b4ea234ceb6f94b4) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [92.269 (1.0X)](https://perf.iree.dev/serie?IREE?712d1d8286ecd1d7d66c2f4426924cff01be3c71d3512d1f675fc3560487113b) | N/A | [21.144 (4.4X)](https://perf.iree.dev/serie?IREE?d43fc641fce6a72ff3fe58571f3c55e36e65ef7fc868f197554cdd9a5a451015) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [92.073 (1.0X)](https://perf.iree.dev/serie?IREE?71eae757691075543390b054227af100cfbb850c70094713e12f2c48c2f7db07) | N/A | [21.865 (4.2X)](https://perf.iree.dev/serie?IREE?3b12e9908a7263dea59779315d80b3b215f17a287e84b6cb3a73ac2b5faa1d0f) | | GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [51.970 (1.0X)](https://perf.iree.dev/serie?IREE?a20f9c8cbe11916179b5a347f4e60d1c4e37519719e1aeeface855fe7fc4740f) | N/A | [21.771 (2.4X)](https://perf.iree.dev/serie?IREE?f64e2e4991de95b0282191703bcc5eade1188cbc1dc5012fe7a377d7300e0954) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [139.164 (1.0X)](https://perf.iree.dev/serie?IREE?8f03b8167746d6dfe9237cf890831c5521ab5169b0892d660c2f817c5f579223) | N/A | [27.097 (5.1X)](https://perf.iree.dev/serie?IREE?8ae7cfed6678287118515c19784beaae637b5bfa1a259ee0c40d0ae15de02f32) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [123.058 (1.0X)](https://perf.iree.dev/serie?IREE?ad83887341f3e360b8a4be6c5683e012c82aeb10d65482cfde8e842bc144a48e) | N/A | [28.978 (4.2X)](https://perf.iree.dev/serie?IREE?d9f100bcdbbfe35bada2541180c89460cc12b0e8a17c3c0126af94dd3e194f04) | | GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [75.876 (1.0X)](https://perf.iree.dev/serie?IREE?38f7c3ef079798f2116c2bdff47240a6a261b066b99b8d12fe8e7da255c0e1f3) | N/A | [26.702 (2.8X)](https://perf.iree.dev/serie?IREE?10216d6baf8d3e228a42f5849a691954104e8fc91e514be1c63736ef737f59d5) | | MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [691.191 (1.0X)](https://perf.iree.dev/serie?IREE?dc2023c6113c87aad59f2b49214ab2995b32c7ba040b314e890ea2ec7081f90b) | N/A | [348.032 (2.0X)](https://perf.iree.dev/serie?IREE?d4572856894af9013e311991e4371c81498ee30b1fc90ee840632d1a3a512193) | | MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [707.603 (1.0X)](https://perf.iree.dev/serie?IREE?ab3be2a007f3201e419112cd2bf753bbbe4e15431946411433a61ab0e34cdfca) | N/A | [360.600 (2.0X)](https://perf.iree.dev/serie?IREE?cca718432d630f48a03660753dbea3c60120aea2692fab0fccf6a4928be7a247) | | MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [394.800 (1.0X)](https://perf.iree.dev/serie?IREE?c1b5a77b70decd14b1d3268ad2a167631422bff95c8f8c126dc7a876bd3c0632) | N/A | [215.761 (1.8X)](https://perf.iree.dev/serie?IREE?04f958179d9bc04eca09f2ad518a3cb494931445f23fcc2791b2d9fcee5cf1bc) | | MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [1120.949 (1.0X)](https://perf.iree.dev/serie?IREE?df6786c3bd20d93e1230f8b59212221a7e9de0eefdc39ac2f7192b76047d2803) | N/A | [319.144 (3.5X)](https://perf.iree.dev/serie?IREE?5a9829035177db026ff3371238afa1f319a3b715e22ea7d1670c8fab8c243d94) | | MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [1121.688 (1.0X)](https://perf.iree.dev/serie?IREE?106ccd69f92add8c01ecfa00b551ae901a5e9864595601ff75e090a03c97dc49) | N/A | [313.872 (3.6X)](https://perf.iree.dev/serie?IREE?dbaa3dbc7fba073c6e934eb505c4679fe625bbfdeb7e2316960c659f6eb8b2e6) | | MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [576.408 (1.0X)](https://perf.iree.dev/serie?IREE?8156b68796001f010990cd4da026415ee8875a0ccf609258df8b27c1cd5ed71e) | N/A | [186.159 (3.1X)](https://perf.iree.dev/serie?IREE?9cff4a1b1873b4cda93168fc674fdc046c7b6640f81283cae57c871afbbe216d) | | Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [2061.880 (1.0X)](https://perf.iree.dev/serie?IREE?1bb02b9cb5407a193c5ad68d57ba004d6694ae1b9f3b4af974af7197f30f9082) | N/A | [296.032 (7.0X)](https://perf.iree.dev/serie?IREE?3263426782173c417a4205ee460ccf4acb939c53397da8ae06f8ebf3f7228f87) | | Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [2060.406 (1.0X)](https://perf.iree.dev/serie?IREE?5f2fe9c7dc19b8dda9300eb881b22481951e7c8f9aaaef2923bf31cea6b4d812) | N/A | [299.921 (6.9X)](https://perf.iree.dev/serie?IREE?893537d80a1d230ac7751f901899b02d29ad6a179afa59b16b16e712a2fab297) | | Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | [1096.299 (1.0X)](https://perf.iree.dev/serie?IREE?8c2249d8f9c199d56ae43e1d4d6b288194fa1e3b31914cc88d210deccad3d351) | N/A | [178.498 (6.1X)](https://perf.iree.dev/serie?IREE?6c111c114ceccecfdeb1b3608ec4701ac6c62fa29abfa7270a6737f92c94cb0b) | | matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | [12.288 (1.0X)](https://perf.iree.dev/serie?IREE?a694805fd2aa24f7bb3464e817ade1eda09588928e5b168947eef7e6b5ac8dee) | N/A | [1.320 (9.3X)](https://perf.iree.dev/serie?IREE?fe0a953188f398da446a84e74ad069d4029568c0a02709b84bef8922533bb14a) |

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 105.631 (vs. 90.653, 16.52%↑) 104.053 3.247
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 27.362 (vs. 24.190, 13.11%↑) 27.261 0.436
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 83.602 (vs. 75.567, 10.63%↑) 83.676 0.301

[Top 3 out of 5 results showed]

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 28.978 (vs. 31.362, 7.60%↓) 28.946 0.784
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1096.299 (vs. 1182.442, 7.29%↓) 1101.193 16.869
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 27.097 (vs. 29.182, 7.15%↓) 27.375 0.874

[Top 3 out of 13 results showed]

Regressed Total Dispatch Sizes 🚩

Benchmark Name Total Dispatch Size (bytes)
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 168240 (vs. 130696, 28.73%↑)
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 184872 (vs. 145080, 27.43%↑)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 839376 (vs. 694752, 20.82%↑)

[Top 3 out of 12 results showed]

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

Benchmark Name Stream IR Dispatch Count (# of cmd.dispatch ops)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 365 (vs. 413, 11.62%↓)
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 365 (vs. 413, 11.62%↓)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 185 (vs. 209, 11.48%↓)

[Top 3 out of 29 results showed]

For more information:

Source Workflow Run