Startup latency, power and performance preference hints

The Mobile-first web-based Machine Learning talk by @JRMeyer and @lrianu explains how Artie's game engine uses TensorFlow Lite for Microcontrollers to run the computer vision models on the client, in the browser. IIUC TF Lite Micro was chosen in part to minimize the startup latency that was crucial for the experience. (Another key design consideration was avoiding dynamic memory allocations.)

This feedback suggests related Web APIs for ML inference should consider catering for various use cases, some of which are latency sensitive during inference time, some want to minimize startup latency, some care about battery performance (e.g. long running tasks on mobile), to give some example.

There has been some work in this area in Web APIs:

WebGL defines WebGLPowerPreference. Similarly, WebNN API for hardware accelerated inference defines PowerPreference, both currently offering "default", "low-power", or "high-performance" options.

Loop in @huningxin and @wchao1115 for WebNN and @RafaelCintron for WebGL expertise to discuss whether we have captured a good set of preferences, and whether there's room for more granular controls from an implementation feasibility point of view. From use cases point of view, the more knobs the better.

@JRMeyer and @lrianu, just curious, did you use Unity's tooling to cross-compile TF Lite Micro into WebAssembly?

@jasonmayes FYI for this interesting usage of TF Lite Micro.

w3c / machine-learning-workshop

Startup latency, power and performance preference hints #95