(filing this to track ideas and see if there is community interest)
Background
IREE is designed to take the "ML" out of "ML deployment" and just turn "running an ML model" into "running some code". The low level runtime code should be well suited for integration into user applications like mobile phone apps and game engines that target a wide range of devices.
The Unity game engine has its own custom libraries for working with ML models:
I've personally had my eye on game engine integration of ML workload local execution for years, and I think there's potential to dramatically simplify some workflows while unlocking more performance and features.
Advantages to a generic compilation approach:
Share toolchains across deployment formats
Optimize as much as possible ahead of time in both compute kernels and scheduling code
(Ideally) no "supported operators" concept - the runtime only supports a very small set of intrinsics and it is the compiler's responsibility to lower from high level ML ops down into those intrinsics (e.g. C/C# code, SPIR-V/GLSL/HLSL code)
Low binary size for runtime code (no "kernel library" at runtime, just code in the program binaries)
Integration ideas
Ergonomics are critically important for a game engine / middleware library, so any integration would need to feel native. We can pattern match to how other large assets are handled (e.g. baked lighting, texture recompression).
Drag-and-drop input model (TF SavedModel, .tflite flatbuffer, PyTorch...? ONNX...?) -> run iree-compile in the editor -> generate IREE .vmfb files as assets to bundle into builds (based on configured platforms in editor settings, could multitarget...)
Generate C# bindings for runtime code, specific to the input program
Codegen ideas
IREE's compiler can generate code for a variety of hardware targets, but Unity also has its own abstractions that we could target instead. These could integrate more directly with existing scheduling mechanisms in the game engine. Performance is tricky to predict... Unity's own compilers/runtime code could either find or miss certain classes of optimizations.
(filing this to track ideas and see if there is community interest)
Background
IREE is designed to take the "ML" out of "ML deployment" and just turn "running an ML model" into "running some code". The low level runtime code should be well suited for integration into user applications like mobile phone apps and game engines that target a wide range of devices.
The Unity game engine has its own custom libraries for working with ML models:
I've personally had my eye on game engine integration of ML workload local execution for years, and I think there's potential to dramatically simplify some workflows while unlocking more performance and features.
Advantages to a generic compilation approach:
Integration ideas
Ergonomics are critically important for a game engine / middleware library, so any integration would need to feel native. We can pattern match to how other large assets are handled (e.g. baked lighting, texture recompression).
iree-compile
in the editor -> generate IREE.vmfb
files as assets to bundle into builds (based on configured platforms in editor settings, could multitarget...)Codegen ideas
IREE's compiler can generate code for a variety of hardware targets, but Unity also has its own abstractions that we could target instead. These could integrate more directly with existing scheduling mechanisms in the game engine. Performance is tricky to predict... Unity's own compilers/runtime code could either find or miss certain classes of optimizations.