Open stellaraccident opened 1 day ago
We need to guide people better than this.
Yes please! https://github.com/iree-org/iree/issues/15487
Let's stop over thinking this and do something simple like I suggest. Open to other options but would like to see this improved.
The suggested proposal SGTM. I might even want to default to having LLVM use the current host for the target CPU and available features (if those are different), then have users explicitly pass "generic" for the lowest common denominator.
We could apply similar logic to the GPU backends - try to detect devices on the system (shell out to vulkaninfo / rocm-smi / nvidia-smi?) and default to what is available, but still support cross compilation with explicit device info and a "generic" target where possible.
Yuck - that is not a cheap thing to do and has a high risk of flakes - I am still not sure why proper documentation is insufficient? You must specify your target device (--iree-hal-target-device=
) when compiling so also specifying a "use my hardware info" is fine. Just change the documentation to include both flags and then a user has a choice and knows what to do if they want to change things. Anything automatic is going to have issues. Users aren't coming in to iree-compile command line invocations blind - if all the docs specify the flag and they choose not to copy/paste it that's on them.
Note that this is also what clang does - https://clang.llvm.org/docs/HIPSupport.html - you must pass --offload-arch=
to compile HIP code and if you want the native host target you must pass --offload-arch=native
. nvcc does the same thing - you pass -arch=[some gpu]
or -arch=native
.
(if there's such a big concern about documentation not fixing this issue then I'd be ok with making compilation fail if the user doesn't specify an arch for a backend - whether a particular arg, generic, native, etc - but guessing is bad)
We can certainly update the docs (https://iree.dev/guides/deployment-configurations/cpu/#compile-a-program) and start with a warning from the compiler if information is omitted and generic is used as the default.
I'm seeing a proliferation of flags (mainly in rocm usage, but also cpu) and the documentation can't keep up. I want more of that to be captured somewhere - docs, samples, the compiler itself, etc.
See one example here: https://github.com/iree-org/iree/blob/914858fb89c028a94564b590626aab519d611e54/experimental/regression_suite/shark-test-suite-models/sdxl/test_unet.py#L90-L110
That's insanity - besides the debug flags (dumping statistics/etc) if any of those are required that's a bug. I think Mahesh has said it before: a feature is not done until it's on by default and if all of those flags are needed to make the model compile or perform then the engineering was never completed. The only two flags required there should be --iree-hal-target-device=hip
(that's using the old deprecated flag) and --iree-hip-target=
(which could be native
if we wanted to do what clang does and invoke amdgpu-arch
if it's present).
I'm fine making --iree-llvmcpu-target-cpu=<something>
required. Given the proliferation of things out there, I think that the way to get there may be to first do what I am suggesting: make it issue a warning if not specified and incorrectly/implicitly defaulting to a generic CPU (with a note that this flag will soon be required).
Agreed on all of the other points. Need to burn down all of the other flags. I'm just starting with this one.
I can take a pass at this, unless someone else wants to.
Plan:
--iree-llvmcpu-target-cpu
is omittediree-llvmcpu-target-cpu-features
is omitted--iree-hal-target-backends=llvm-cpu
in-tree and set those flags explicitly
--iree-hal-target-device
?Can someone clarify why we have all three of these flags?
--iree-llvmcpu-target-triple
--iree-llvmcpu-target-cpu
--iree-llvmcpu-target-cpu-features
It seems like the triple could be a superset of the cpu? Is there some redundancy there? I see some riscv sample code setting both: https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/runtime/src/iree/hal/local/elf/testdata/generate.sh#L68-L73
but even our microkernels blog post (highlighting cpu performance work) only includes a few of the flags: https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/docs/website/docs/community/blog/posts/microkernels.md?plain=1#L25-L32
Oh, the linalg tutorial from @bjacob explains that target-cpu
is for x86, but target-cpu-features
is for other architectures? https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/docs/website/docs/community/blog/posts/linalg-tutorial.md?plain=1#L183-L193
There is some complicated logic and then a few calls into LLVM itself in https://github.com/iree-org/iree/blob/main/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.cpp.
(This is why I filed https://github.com/iree-org/iree/issues/15487 - I've wanted someone directly familiar with LLVM CPU to be driving this)
More context on cpu
vs cpu-features
:
This stuff always grows into a bit of a hairball. The condition we are trying to guard is that iree-llvmcpu-target-cpu
being empty should never drive a decision (should be a warning now and an error eventually). Will need to peel back the decision tree to that.
The CPU flags mirror LLVM - we can't remove them, but we could more intelligently populate them - maybe - triple is often not enough. I think we do ask for defaults from LLVM today. I'm hesitant to suggest we diverge from clang behavior as then we have to support that (and if the issue here is that our documentation sucks adding bespoke stuff only hurts that).
Made some progress stepping through the details:
I tried this resnet50 ONNX model with and without --iree-llvmcpu-target-cpu=host
on my system. I see about 140ms with the flag and 170ms without:
JitGlobals
runs the CPU compilation pipeline for the host, regardless of flags and requested devices. That's fine, it's an implementation detail of the compiler. We shouldn't emit any warnings on this path. Relevant code: https://github.com/iree-org/iree/blob/891f438c27f201ae0dde3e8408eefc6fca80bc5f/compiler/src/iree/compiler/ConstEval/JitGlobals.cpp#L658-L668 https://github.com/iree-org/iree/blob/891f438c27f201ae0dde3e8408eefc6fca80bc5f/compiler/src/iree/compiler/ConstEval/JitGlobals.cpp#L804-L814
The targetCPU
flag default is set here: https://github.com/iree-org/iree/blob/891f438c27f201ae0dde3e8408eefc6fca80bc5f/compiler/plugins/target/LLVMCPU/LLVMTargetOptions.h#L178-L181 I'm thinking I'll replace that with empty string then add some logic that warns and sets it back to that default later. Need to still watch for how JitGlobals
, default target construction from CLI flags, and explicit target construction from --iree-hal-target-device
or program IR interact with all the code paths.
I've seen multiple people falling down this hole: they run iree-compile on their model, targeting CPU. Then they get performance that is 10x-100x off of any reasonable expectation. Then they either go away silently or report back about poor experiences (not always reporting flags and such).
There are good reasons why a compiler like IREE shouldn't make assumptions about what the CPU target is, but on the other hand, it will almost always produce a a grossly subpar experience to not specify a target CPU, since the generic target (at least on X86) lacks so many features as to be basically useless for any high performance numerics.
I've even fallen down this hole recently and had to go remember the incantation to select a specific CPU. In the case I was working on (an f16 CPU LLM), performance was 100x different between not specifying a target CPU and specifying "host". We need to guide people better than this.
Proposal
As mentioned, there are good reasons for a compiler to not make too many assumptions without being told what to do. But I think we can/should actively warn, possibly with a link to the documentation site when the compiler is invoked without the user specifying a target CPU. Whatever the warning is should be very explicit that the user should pass
--iree-llvmcpu-target-cpu=host
to target the precise CPU they are running on. We should possibly also accept "generic" or something like that for if the user really wants to target the default and not get the warning. I basically want to guard against the case where the user has not specified anything and the compiler just silently generates 100x too slow of code. In almost all cases, it will be better for the user to say something and we should guide them on a proper choice.