iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.58k stars 577 forks source link

[spirv] Switch to use common target description #17623

Closed antiagainst closed 3 months ago

antiagainst commented 3 months ago

This commit switches SPIR-V side to use the common #iree_gpu.target to describe the GPU characteristics. With it we can now remove the ad-hoc Vulkan attributes and dialects and unify how GPU are described across various GPU compiler backends in IREE.

SPIR-V

SPIR-V has some additional requirements that we need to account for:

We have many vendors and APIs to handle there so this commit adds various AMD/ARM/NVIDIA/Qualcomm targets for development purposes so that we can specify them with a shorthand.

In order to be extensible, leverage the feature field in #iree_gpu.target to specify additional capabilities with cap: prefix and extensions with ext: prefix. We also use the feature field to specify what SPIR-V version to target with the spirv:v1.x format.

Right now the SPIRVConvertGPUTarget pass is invoked immediately before configuration. This is to stage the changes. As a next step we need to move it immediately before ConvertToSPIRV pass.

Vulkan

--iree-vulkan-target-env is dropped given now we removed the whole Vulkan dialect and cannot control with a #vk.target_env attribute anymore.

The default --iree-vulkan-target-triple now becomes vp_android_baseline_2022, which is a a good lowest common denominator to guarantee the generated SPIR-V is widely accepted. We are not considering SwiftShader now anymore like previously due to testing purposes.

The --iree-vulkan-target-triple should be renamed given it's not a triple anymore--that will happen later together with other GPU backends (i.e., cuda/hip) to be consistent.

In order to support cooperative matrix conversion, we added WMMA_F16_16x16x16_F16. For NVIDIA GPUs we are abusing it right now without considering the concrete explicit layout--that is fine given in Vulkan they are opaque anyway. But this need to be fixed if we are targeting WMMA in CUDA.

Metal / WebGPU

We now contruct a #iree_gpu.target to specify the target to drive SPIR-V CodeGen.

Progress towards https://github.com/iree-org/iree/issues/16341

ci-extra: test_nvidia_gpu,test_nvidia_a100,test_amd_mi250, build_test_all_macos_arm64,build_and_test_android

antiagainst commented 3 months ago

Okay all tests are passing so this is ready for reviews now..

ScottTodd commented 3 months ago

This appears to have broken some benchmark builds: https://github.com/iree-org/iree/actions/runs/9572945480/job/26393626504#step:7:706

FAILED: e2e_test_artifacts/iree_module_MobileBertSquad_int8_tflite___arm-valhall-vulkan_android31-vulkan_spirv__default-flags_compile-stats_/module.vmfb /work/build-e2e-test-artifacts/e2e_test_artifacts/iree_module_MobileBertSquad_int8_tflite___arm-valhall-vulkan_android31-vulkan_spirv__default-flags_compile-stats_/module.vmfb 
cd /work/build-e2e-test-artifacts/tests/e2e/test_artifacts && /work/full-build-dir/install/bin/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=vulkan-spirv --iree-input-type=tosa --iree-vulkan-target-triple=valhall-unknown-android31 --iree-vm-emit-polyglot-zip=true --iree-llvmcpu-debug-symbols=false --iree-scheduling-dump-statistics-format=json --iree-scheduling-dump-statistics-file=/work/build-e2e-test-artifacts/e2e_test_artifacts/iree_module_MobileBertSquad_int8_tflite___arm-valhall-vulkan_android31-vulkan_spirv__default-flags_compile-stats_/scheduling_stats.json /work/build-e2e-test-artifacts/e2e_test_artifacts/iree_MobileBertSquad_int8_tflite_.mlir -o /work/build-e2e-test-artifacts/e2e_test_artifacts/iree_module_MobileBertSquad_int8_tflite___arm-valhall-vulkan_android31-vulkan_spirv__default-flags_compile-stats_/module.vmfb --iree-hal-executable-object-search-path=\"/work/build-e2e-test-artifacts\"
failed to translate executables
failed to translate executables
failed to translate executables
failed to translate executables
failed to translate executables
failed to translate executables
failed to translate executables
failed to translate executables
failed to translate executables
<unknown>:0: error: loc(callsite("model/bert_span_labeler/mobile_bert_encoder/mobile_bert_embedding/embedding_norm/add" at "main")): 'spirv.VectorShuffle' op result #0 must be vector of bool or 8/16/32/64-bit integer or 16/32/64-bit float values of length 2/3/4/8/16, but got 'i32'
<unknown>:0: note: loc("main"): called from
<unknown>:0: error: loc(callsite("model/bert_span_labeler/mobile_bert_encoder/mobile_bert_embedding/embedding_norm/add" at "main")): failed to run translation of source executable to target executable for backend #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {iree.gpu.target = #iree_gpu.target<arch = "valhall1", features = "spirv:v1.4,cap:Shader", wgp = <compute =  fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [], subgroup_size_choices = [16, 0], max_workgroup_sizes = [512, 512, 512], max_thread_count_per_workgroup = 512, max_workgroup_memory_bytes = 32768>>, spirv.target_env = #spirv.target_env<#spirv.vce<v1.4, [Shader, Float16, Int64, Int16, Int8, StorageBuffer16BitAccess, StorageUniform16, StoragePushConstant16, StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess, StoragePushConstant8, GroupNonUniformShuffle, GroupNonUniformShuffleRelative, GroupNonUniformArithmetic, DotProduct, DotProductInput4x8BitPacked, DotProductInputAll, DotProductInput4x8Bit], [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_integer_dot_product]>, #spirv.resource_limits<max_compute_shared_memory_size = 32768, max_compute_workgroup_invocations = 512, max_compute_workgroup_size = [512 : i32, 512 : i32, 512 : i32], subgroup_size = 16, min_subgroup_size = 0, max_subgroup_size = 16, cooperative_matrix_properties_khr = []>>}>