iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.55k stars 568 forks source link

Rework HAL CTS build targets to improve build/link time #17173

Open ScottTodd opened 4 months ago

ScottTodd commented 4 months ago

I noticed multiple minutes of build time and large chunks of build directory disk space being spent on HAL CTS test object files and binary linking at https://github.com/iree-org/iree/pull/17145#discussion_r1576873396 and https://github.com/iree-org/iree/pull/16273#issuecomment-1918174524. Each driver that instantiates the CTS (e.g. hal/drivers/vulkan/cts/CMakeLists.txt) creates one test binary per test case file (e.g. allocator_test, event_test, semaphore_test). We should be able to share part of the build time and binary size between similar tests (across drivers) and/or drivers (across tests).

Some other ideas from Ben:

I don't recall exactly, but I'd probably have suggested one exe per test with all the drivers compiled in, or one exe per driver with all the tests compiled in, or a dynamic library per driver with all the tests compiled in and then generated shim exes that just loaded it and called main (for easy f5 debugging/avoiding flags) (dynamic library is trickier and limits to platforms where dynamic libraries are a thing, etc, so not great) an exe per driver seems fine - can use gtest filtering to run subsuites and such or run exe to run all tests and get a nice report without having to run different binaries could expose to ctest as individual test suites using the same exe but with different flags, so it shows up in reports nicer

Code:

Progress/sketches at https://github.com/ScottTodd/iree/tree/cts-improve-link-time

ScottTodd commented 1 month ago

https://github.com/iree-org/iree/pull/17846 added a few more CTS test binaries, so this is getting more important.

CI run before: https://github.com/iree-org/iree/actions/runs/9884624163/job/27301313444 (22 minutes) CI run after: https://github.com/iree-org/iree/actions/runs/9885502550/job/27303651412 (25 minutes)

benvanik commented 1 month ago

Yeah I thought about that when doing it ;( I think we can change the embedded driver name string into a flag - the test is compiled into a library without the hardcoded driver name and then we could produce one binary per driver that just has a main() that sets the flag and calls into the library (to keep easy "run this binary without flags") or a single binary with the flag and find a way to make cmake create synthetic targets that pass the flag that are still debuggable.

ScottTodd commented 1 month ago

Another data point: https://github.com/iree-org/iree/actions/runs/9914405645/job/27394002144#step:4:10

gcloud storage cp gs://iree-github-actions-postsubmit-artifacts/9914405645/1/build-android-arm_64.tar

2m10s to download a 1.6GB archive onto an android test runner, then another 4m45s to extract it. Of the 1.6GB, 960MB is under runtime/src/iree/hal/, with 11MB per CTS executable 15 CTS tests 5 configs (local_sync vmvx, local_sync llvm-cpu, local_task vmvx, local_task llvm-cpu, vulkan) = 825MB