Revive Metal runtime HAL backend with one written against the C HAL API

benvanik commented 3 years ago

As part of the HAL C++ -> C rewrite (#4369) the Metal backend (iree/hal/metal/) was temporarily dropped. A commit (referenced here) has the configuration bits that can be reverted to revive it. The new implementation should be written in Objective C (as no need for C++) and against the C API.

antiagainst commented 3 years ago

Hi @freedomtan, cc you here because you've enabled Bazel build configurations for Metal. Thanks a lot for that! Just curious, will this be something that you are potentially interested to lend a hand? If so that would be truly awesome (as there are so many things to work on in IREE and our team just have limited bandwidth). If not then certainly no worry and feel free to ignore! :)

benvanik commented 3 years ago

It should be relatively straightforward to revive it for someone in the Apple dev ecosystem - the base implementation is solid and it's just something that requires some time and attention from someone to port and then finish the missing bits. I didn't want to unilaterally force that on you @antiagainst as I figured you'd be having more fun/impact doing GPU perf work this quarter (which thanks to the spriv-cross approach means that when Metal is revived it'll get a lot of the same benefits!). I may also get around to reviving it after landing the initial work and not feeling the pressure to also land the other big bits I need to :)

antiagainst commented 3 years ago

Certainly! :D I will circle back to this eventually if @freedomtan is not interested. But just wanted to ask first just in case, given that this could be a very interesting (and well isolated) task to shape IREE towards @freedomtan's potential use cases.

freedomtan commented 3 years ago

@antiagainst and @benvanik Yes, I am interested in working on it. Please tell me exactly what / where to start.

antiagainst commented 3 years ago

That is fantastic! Thanks @freedomtan! You can read #4369 for background information regarding why we'd like to perform such change. Ben is in the process of putting up initial path towards it as linked in #4369. Please feel free to read it to understand more. After landing that we can look at the situation and have more detailed tasks to collaborate on. :)

nicholasjng commented 2 years ago

Hello IREE team,

I'm a JAX user on Apple M1, and I've recently expressed interest in GPU acceleration of JAX programs on M1, which lead to me being pointed to this issue (see the JAX discussion reference at the very bottom of this issue's timeline).

I wanted to ask about the status quo of Metal in the current IREE roadmap / plans, as well as express my interest regarding Metal support. I have wanted to get my hands on GPU programming for some time now, too, so if it is feasible for me to contribute in that regard to this project, I would be happy on some direction about where to start. (I read a little about the Metal HAL in the commit history, which is presumably what needs to be implemented to make this work.)

Thank you for your consideration.

antiagainst commented 2 years ago

Hey @nicholasjng, thanks for your interest in this! Really great to see that. We certainly would like to revive Metal support. It was removed to make HAL migration easier; and since then we are focusing on other priorities and haven't been able to look into it. If you'd like to contribute to it, it would be great!

Speaking of detailed tasks, 1) I think the main part is, as you've found out, bringing back the HAL with Objective-C. In the history it was written in Objective-C++. But it should be relatively straightforward. We have HAL CTS that you can rely on (and extend with!) to take incremental steps by first bring back the driver, then command queue/buffer, etc. You can basically follow the original order of how they are initially implemented and bring each commit back.

For the kernel side, we have the CodeGen flow going from Linalg -> SPIR-V -> SPIRV-Cross -> MSL still in tree. 2) Long term I think we might want to switch to use Tint but for now we can keep the SPIRV-Cross path to use it to bring back the HAL side first.

Of course, before 1) and 2), we need to first make sure macOS builds and tests properly. We don't have an active buildbot checking that at the moment so something might not work (though shouldn't be big problems I assume). If you'd like to get your hands dirty relatively quickly, you can certainly start on this. (We will have an Apple device soon to make sure we don't regress in the future.)

Does the above make sense? Happy to explain any unclear parts.

benvanik commented 2 years ago

Hello! We're all very interested in having Metal support - and there are a few others lurking interested in it too so I'm posting this as an update for them too :)

The main delay has been that any target added requires an owner that we currently don't have for Metal - it needs to not only be written but setup on CI, analyzed for performance, maintained as codegen evolves, improved as needed for overall system evolution, bugs need responding to, etc etc. We do however have a provision for adding experimental backends as we have done with the experimental ROCM backend that lightens the requirements by making it best-effort (mass renames and refactoring will mostly keep it going but it's not guaranteed to be buildable/running at any particular commit). Hopefully soon we are getting an M1 CI bot which will help that further as we could at least know when it breaks, but since a majority of the team does not use macs for development it still needs someone responsible for triaging issues and keeping it operating before we'd be able to turn it on continuously.

I think the biggest thing needed then is someone with relatively deep Metal (or Vulkan/D3D12) experience who can ensure the whole design hangs together and operates as expected. Thankfully a lot of the trickier issues that will be encountered have been analyzed and solved by the MoltenVK project (emulating Vulkan on Metal) or WebGPU working group (which evaluated how to match features across Vulkan/Metal/D3D12) and that lightens the load a bit on some of the specific details but does still require research. I wouldn't call it causal contributor territory but definitely something a motivated and skilled C/ObjC contributor who is familiar with how GPUs operate and eager to dig into a fun layer could chew on (even if not a Metal expert in particular). There's a few people interested and ideally all could help collaborate on the various aspects (@antiagainst - who did the initial work - has mentioned he's willing to help out too, just may not have time immediately).

The code has changed quite a bit since the old implementation was pulled out and will need to be rewritten (we switched from C++ to C, which means we no longer have to use Objective C++ and can use the much easier to work with Objective C 🎉), but the compiler side generating the shaders is still in the codebase and AFAIK should still work (under iree/compiler/Dialects/HAL/Targets/). One thing we need to circle back on with it is how it converts SPIRV to MSL: nowadays there is tint which handles SPIRV->MSL/WSL (among others) in Chrome and would also help us get the WebGPU backend going. That work could happen in parallel with any of the runtime work required and is also non-blocking (as we currently have the SPIRV->MSL path and could get the runtime implemented even if it's reliant on less robust tooling).

A lot of this will become less volatile and scary over time as we finalize the HAL - it's much less at risk of churn to bring up Metal today than it was 6 months ago and 1 year ago before that when the original implementation was yanked - but it'll also be better in a few months. Notably the compiler is about to light up the usage of barriers and semaphores for the first time, following that descriptor sets and command buffer caching, and then finally events for more fine-grained dependency scheduling. Today those are nominally present in the HAL but there's nothing putting pressure on them and it can be difficult to build out the code such that they'll work without a deep understanding of the GPU APIs and how they fit together. Not a blocker - especially for experimental work - and just as we do with CUDA today it's possible to safely ignore a lot of those features in an initial implementation (always blocking on every submit, etc) - but that's really the separator between experimental and ready for general use.

TLDR: I think if a few people were serious about dedicating time to it and someone could step up to manage it (@antiagainst?) then we could start landing an experimental variant with the goal of graduating it when ready. But really I think it's never too early to start learning Metal - and doing anything in the area of low-level GPU APIs requires a lot of learning - and that's probably the best place to start for any contributor (there won't be much to do without the baseline knowledge). I'd recommend looking into the WebGPU/MoltenVK code, authoring some compute-focused standalone applications, etc. Once you put together an end-to-end flow the kind of issues we deal with when trying to integrate it into a library like IREE will become much easier to reason about and experiment with.

benvanik commented 2 years ago

hah! nice timing :P

antiagainst commented 2 years ago

+1 to what Ben said. Among the three modern GPU APIs (D3D12, Metal, Vulkan), Metal is the simplest: it has its explicit core, but still try to maintain usability like previous-gen implicit APIs. So it is a good starting point to learn GPU and its programming (and avoid getting drown by the zillions of Vk* structs. :D)

I'm certainly happy to coordinate and support.

nicholasjng commented 2 years ago

Thank you for your comments! They are really helpful. I forked the project earlier, and I hope to find some time to build and test on M1 Pro in the coming days (most likely Sunday).

Looks like I got a nice amount of different concepts to catch up on, be it Objective-C, Metal or shader compilation in general. I would certainly appreciate remaining in contact with you guys as I progress in these areas. Once again, thank you for taking the time to answer my questions :)

stellaraccident commented 2 years ago

Fyi - we don't have a ci yet that tests on apple. But we do have a nightly job that builds on Mac x86 that I keep running. Not perfect, but I expect the project actually builds on that platform, which hopefully means less startup pains.

On Fri, Nov 19, 2021, 9:13 AM Nicholas Junge @.***> wrote:

Thank you for your comments! They are really helpful. I forked the project earlier, and I hope to find some time to build and test on M1 Pro in the coming days (most likely Sunday).

Looks like I got a nice amount of different concepts to catch up on, be it Objective-C, Metal or shader compilation in general. I would certainly appreciate remaining in contact with you guys as I progress in these areas. Once again, thank you for taking the time to answer my questions :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/iree/issues/4370#issuecomment-974253954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYVACK3RDHO3GH5MZWJLTUM2ASBANCNFSM4VTTOTNQ .

benvanik commented 2 years ago

If you have any questions or just want to chat about GPU stuff feel free to come hang out in our discord - for dark arts like GPU programming sometimes just knowing someone who remembers a link to some trick seen before can save days to weeks of frustration :)

nicholasjng commented 2 years ago

I tried building IREE from source today using the docs, Linux & macOS section. (I didn't have ccache installed, so the last CMake step I had to skip).

The build did end up failing, but only at the samples section later. The traceback:

cmake -GNinja -B ../iree-build/ -S . \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DIREE_ENABLE_ASSERTIONS=ON \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DIREE_ENABLE_LLD=ON

...

[4367/4429] Generating simple_mul.h, simple_mul.o, simple_mul.vmfb
FAILED: iree/samples/static_library/simple_mul.h iree/samples/static_library/simple_mul.o iree/samples/static_library/simple_mul.vmfb /Users/nicholasjunge/Workspaces/c++/iree-build/iree/samples/static_library/simple_mul.h /Users/nicholasjunge/Workspaces/c++/iree-build/iree/samples/static_library/simple_mul.o /Users/nicholasjunge/Workspaces/c++/iree-build/iree/samples/static_library/simple_mul.vmfb
cd /Users/nicholasjunge/Workspaces/c++/iree-build/iree/samples/static_library && /Users/nicholasjunge/Workspaces/c++/iree-build/iree/tools/iree-translate -iree-mlir-to-vm-bytecode-module -iree-hal-target-backends=dylib-llvm-aot -iree-llvm-link-embedded=false -iree-llvm-link-static -iree-llvm-static-library-output-path=simple_mul.o /Users/nicholasjunge/Workspaces/c++/iree/iree/samples/static_library/simple_mul.mlir -o simple_mul.vmfb
ld.lld: error: unknown argument '-dylib'
ld.lld: error: unknown argument '-flat_namespace'
ld.lld: error: unable to find library -lSystem
ld.lld: error: /var/folders/rw/mj164vm16k10x_byww6my1s00000gn/T/simple_mul_dispatch_0-2b4a26.o: unknown file type
Linking failed; escaped command line returned exit code 256:

/Users/nicholasjunge/Workspaces/c++/iree-build/third_party/llvm-project/llvm/bin/ld.lld -o /var/folders/rw/mj164vm16k10x_byww6my1s00000gn/T/simple_mul_dispatch_0-2b4a26.so -static -dylib -flat_namespace -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -undefined suppress /var/folders/rw/mj164vm16k10x_byww6my1s00000gn/T/simple_mul_dispatch_0-2b4a26.o

macOS 12.0 on M1 Pro.

Output of which clang:

iree on  main took 2s
➜ which clang
/usr/bin/clang

Output of clang -v:

iree on  main
➜ clang -v
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.0.1
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

EDIT: Apparently iree-translate uses LLD by default, which on macOS Mach-O targets does not support some of the options used there: https://lists.llvm.org/pipermail/cfe-dev/2019-March/061666.html

powderluv commented 2 years ago

Top of main should build on OSX now.

nicholasjng commented 2 years ago

Can confirm, main @ 04bd094 builds fine on macOS 12.2 with the suggested commands/settings. Thanks!

powderluv commented 2 years ago

The spirv backend with moltenvk is mostly functional now. Still need to land a few issues @antiagainst has in flight.

bhack commented 2 years ago

Do you think it will be possible to access to ANE hw? https://github.com/geohot/tinygrad/tree/master/accel/ane

iree-org / iree

Revive Metal runtime HAL backend with one written against the C HAL API #4370