iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.6k stars 583 forks source link

Clear plan and standardization for driver/backend naming #2175

Closed GMNGeoffrey closed 3 years ago

GMNGeoffrey commented 4 years ago

The current driver and backend names are used inconsistently throughout the codebase.

Currently, we have three iree compiler backends (vmla, llvm-ir, and vulkan-spirv) and three iree runtime drivers (vmla, llvm, and vulkan). While right now there is a 1:1 correspondence between target backends and drivers, their could in principle be a many-to-many mapping (e.g. two ways to get something to run on vulkan, or two ways to jit llvm IR). This distinction and the naming scheme is used inconsistently throughout the codebase. The check test rules use the naming in its full generality, which creates verbose test target names, the python e2e tests clump them together, but also use slightly different names (e.g. "llvmjit" as the backend name). These names are also used by the CI to filter based on runtime drivers that aren't available in certain environments (e.g. we can't run vulkan in OSS CI yet). We should figure out a way to do this that doesn't back us into a corner for the many-to-many case and still allows reasonable filtering and breakdown of tests.

ScottTodd commented 4 years ago

We now also have the "dylib-llvm-aot" compiler backend: https://github.com/google/iree/blob/d8db9297c2862d55d719951c237e401d05bf4f12/iree/compiler/Dialect/HAL/Target/LLVM/LLVMAOTTarget.cpp#L102 and the "dylib" HAL Driver: https://github.com/google/iree/blob/d8db9297c2862d55d719951c237e401d05bf4f12/iree/hal/dylib/dylib_driver.cc#L33 https://github.com/google/iree/blob/d8db9297c2862d55d719951c237e401d05bf4f12/iree/hal/dylib/dylib_driver_module.cc#L35-L36

benvanik commented 4 years ago

Treating the compiler backend + its options and the output format independently will help. The compiler cares about the exact backend it is using (vulkan+spirv, llvm codegen, vmla), the target architecture(s) (vulkan 1.1 + spirv 1.3, x86-64, arm64, etc), and what the output format will be (spirv binaries, vm binaries, dynamic libraries, c source code, etc). These are like LLVM target configurations and can be either very generic or very specific.

The runtime side is a bit easier, as for the most part it'll use what it's given. If the vulkan HAL backend is given spirv shaders it'll use the version supported, if the dylib backend is given multiple binaries for different architectures it'll pick whatever the host architecture is, etc. In this way controlling the compiler configuration is implicitly controlling the runtime configuration and nothing additional is needed: if you want to test vulkan 1.1 + spirv 1.3 you just compile only that and run with the vulkan backend, or if you want to test arm64 on android you just compile that and run it on an android with arm64 support.

I think you'll have a problem trying to include all the information in all target names without just making a mess. Having a way to define target specifications and reuse them would make this easier. For example, if we could in one place define what "android-27-arm64-cpu" meant with respect to the compiler and runtime settings that's much more readable and easier to filter on. We won't have to do the full matrix expansion of N:M:... and instead just have the configurations we are interested in.

Internally this is how mobile testing is done (declarative device types that encompass many settings), and would be good to model here.

benvanik commented 4 years ago

(the other implication here is that we don't need to build a unique runtime binary for every compiler input - the same android arm64 binary with dylib support can run compiler output from any backend that can generate arm64 dylibs)

GMNGeoffrey commented 4 years ago

I think you'll have a problem trying to include all the information in all target names without just making a mess.

Yeah. The thing I really want is different hal drivers are different test targets. This allows CI and humans to filter based on their runtime environment. The compilation should really be a build step on the host anyway and we should just do it once. This means that we can bundle all tests that target a single driver into a single test target. That means we lose some test framework parallelism and granular test failures, but I think that's probably ok.

benvanik commented 4 years ago

Hmm no, that's the opposite of what you want: you lose the ability to have fine-grained control over the compilation modes for the tests you run. When working on bringing up a new test what you want to do is be able to bring up one compiler backend at a time, and if all targets are compiled together you lose that. That's what has caused so much mess with the current setup: you have to duplicate entire test suites just to exclude a single test and it's easy to loose tests from various configurations.

The runtime target is implicit based on your build configuration - if you're building for android and vulkan, you will be targeting android as your cross-compilation target and specifying that you want the vulkan backend to be built and linked in. There's no need to include it again as part of the test itself.

GMNGeoffrey commented 4 years ago

I was saying we could put all targets for the same driver into the same target if you're worried about too many test targets. Having a separate test target per compilation backend, as we do for the check tests now, is fine with me. There is a need to know what driver a test is going to be using in order to be able to filter tests by required runtime capabilities. We can do that based on knowing what each compilation target will ultimately use as a driver

The runtime target is implicit based on your build configuration

This is different from what you said when I initially set things up. Is it not the case that we could have multiple drivers that could run the same compilation output? Then this is a many-to-one mapping and that makes things much easier.

if all targets are compiled together you lose that. That's what has caused so much mess with the current setup: you have to duplicate entire test suites just to exclude a single test and it's easy to loose tests from various configurations.

I don't understand what you're saying here. Each target is compiled separately and run separately in the check test suites. We can definitely clean up the macro API, but it's not compiling targets all together.

benvanik commented 4 years ago

For a particular output format there's only a single HAL driver that can handle them. A HAL driver can handle multiple output formats, though, such as a TPU backend that may use different binary blobs for different hardware revisions.

The thing I really want is different hal drivers are different test targets. The test targets should be the compilation options, and the HAL drivers should be selected implicitly, such that there's one test target per compilation mode (vulkan-spirv, llvm-aot-arm64, etc) an each depends on a binary containing only the driver required.

I think I was remembering things prior to what it currently is with the check tests.

GMNGeoffrey commented 4 years ago

For a particular output format there's only a single HAL driver that can handle them. A HAL driver can handle multiple output formats, though, such as a TPU backend that may use different binary blobs for different hardware revisions.

Excellent. That makes things much easier :-) I thought previously you mentioned an example like vulkan 1.1 vs vulkan 1.2? Wouldn't that be different drivers that could handle output from the same compilation backend?

benvanik commented 4 years ago

Those would be the same driver code but it may expose multiple device types (both would live under iree/hal/vulkan/ and be linked in with the same library). So they could use the same binary, but may require different flags... hmm... though that's only really a useful contortion for testing and maybe we can avoid it. For example, by default the vulkan driver will try to create the highest version supported by the current platform and if there's a compiled executable that can't be loaded on that version (1.2 on 1.1) it'll just fail. But really that means we should not have tried that test in the first place (the 'vulkan-1.1-spirv-1.3-android-r29-etc' compilation target should execute in an environment where only vulkan 1.1 is available, so it'll never be running with vulkan 1.2, and thus there's no need to be able to specify). Same with TPU devices; the compilation target will define which device is supported and the test environment selected shouldn't expose other devices besides that one.

GMNGeoffrey commented 4 years ago

Those would be the same driver code but it may expose multiple device types

Ah thanks for clarifying the distinction.

the 'vulkan-1.1-spirv-1.3-android-r29-etc' compilation target should execute in an environment where only vulkan 1.1 is available, so it'll never be running with vulkan 1.2, and thus there's no need to be able to specify

How do we ensure the environment doesn't have vulkan 1.2 without specifying? :-D

Maybe this is what you're already saying, but what about having the module report what it should be run on (which is configured based on its compilation options). It can indicate this with varying levels of specificity. Like it could say "run me on vulkan" and the driver will create the highest vulkan version supported by the current platform. This is what we'd do in normal cases. For testing, we'd have it say "run me on vulkan 1.1" and the driver would always use that. Does that seem reasonable? Then everything is controlled by compilation options.

The test suites then become "run this test on all these different compilation options" and we extract the relevant runtime requirements from that, which we use to filter tests.

benvanik commented 4 years ago

That's pretty much what I'm suggesting, only we want to do that entirely at build time instead of at runtime -- we don't want to specify the support within the built module output as then query it as we explicitly want modules to be forward compatible (something compiled for vulkan 1.1 can run on vulkan 1.2, and it could run on a future vulkan 1.3, etc).

Having a declarative target configuration that allowed us to specify precisely what we wanted for compilation options and runtime environments would let us separate the test suite definition from the rest of the data and get better reuse.

Brainstorming:

module_test_suite(
  srcs = ["foo.mlir", "bar.mlir"],
  targets = [
    ":llvm-aot-arm64",
    ":vulkan-1.1-generic",
    ":vulkan-1.1-mali",
    ":metal-2-generic",
  ],
)

# ties the compiler and runtime together as a single named config; really just an alias to allow us to update these independently as we add more runtime configurations
target_config(
  # could be "android-pixel4" or something more specific
  name = "llvm-aot-arm64",
  # shared compiler settings - could also make this a list
  compiler = ":llvm-aot-arm64",
  # this being a list lets us easily add new runtime configurations without needing to modify all tests
  runtimes = [":android-r29-arm64-dylib", ":ios-v13-arm64-dylib"],
)

target_config(
  name = "metal-2-generic",
  # would share compiler with ios/macos/etc
  compiler = ":metal-2-generic",
  # ... but different runtimes for each type
  runtimes = [":ios-v13-arm64-metal-2", ":macos-v10.13-x64-metal-2"],
)

# shared for all llvm-aot targets going to arm64
compiler_config(
  name = "llvm-aot-arm64",
  flags = [
    "-iree-hal-target-backends=llvm-aot",
    "-output-format dylib",
    "-march=arm64 ...",  # target-specific flags for output
  ],
)

runtime_config(
  name = "android-r29-arm64-dylib",
  deps = ["...:dylib_driver_module"],  # linked in hal driver module
  defines = [
    "-DANDROID_NDK=17", # may have compiler flags that are required
    "-DIREE_VM_VERBOSE_LOGGING",
  ],
  flags = [
    "--iree-status-capture-stack-traces",  # flags passed during test execution
    "--iree-dylib-search-paths=/data/...",
  ],
  # only supported on linux hosts where we can run the emulator, etc
  # would be darwin for ios targets, windows for win32/d3d12, etc
  host_os = "linux",
  # could have emulator flags and other things here in a nice central place
)

This would create these test targets:

foo_llvm-aot-arm64_android-r29-arm64-dylib
foo_vulkan-1.1-generic_android-r29-arm64-vulkan
foo_vulkan-1.1-mali_android-r29-arm64-vulkan
foo_llvm-aot-arm64_ios-v13-arm64-dylib
foo_metal-2-generic_ios_v13-arm64-metal-2
foo_metal-2-generic_macos_v13-arm64-metal-2
bar_llvm-aot-arm64_android-r29-arm64-dylib
bar_vulkan-1.1-generic_android-r29-arm64-vulkan
bar_vulkan-1.1-mali_android-r29-arm64-vulkan
bar_llvm-aot-arm64_ios-v13-arm64-dylib
bar_metal-2-generic_ios_v13-arm64-metal-2
bar_metal-2-generic_macos_v13-arm64-metal-2

The runtime builds required would be only those unique runtime_configs:

iree-check-test_android-r29-arm64-dylib
iree-check-test_android-r29-arm64-vulkan
iree-check-test_ios-v13-arm64-dylib
iree-check-test_ios-v13-arm64-metal-2
iree-check-test_macos-v13-arm64-dylib
iree-check-test_macos-v13-arm64-metal-2

The constructed test targets would have their filter tags set such that you could include/exclude based on the compiler ("I'm just testing vulkan changes, no need to test anything else"), platform ("I'm on linux and only care about android, so don't try to build macos binaries and compile for metal"), or device ("I'm specifically optimizing for the pixel 4, so only compile and run that single configuration").

Adding a new device is easy and just requires adding it to the right place in the matrix and its tags will be consistent with all existing tags such that adding new devices won't break existing workflows. The runtime configs have enough info to be able to filter at a build-generator level when needed (so we can in bazel and cmake detect the host platform and completely disable targets that can't be built/run), etc. It also lets us come up with one-off configurations that are useful for particular benchmarks/tests/etc and opt specific tests into them without (in many cases) producing more build artifacts.

phoenix-meadowlark commented 4 years ago

Geoffrey and I were talking about removing the discrepancy between the python and C++ driver names for LLVM: iree_llvmjit and llvm respectively. Since we recently added dylib-llvm-aot, we are thinking that it might make sense to rename the python one to iree_llvm_jit and the driver to llvm-jit. Then they would be consistent with each other and clearly distinct from dylib-llvm-aot.

GMNGeoffrey commented 4 years ago

@phoenix-meadowlark I think you should just go ahead and do your rename as proposed

benvanik commented 3 years ago

This will be done as part of #3817 / #3937 and related work.