Closed sugatoray closed 8 months ago
cc: @awni
I don't think it's possible to have MLX be a back-end for ONNX Runtime Web 🤔 . I think that requires javascript or some kind of code/API the browser can execute. But maybe I am missing something there..?
The easiest way to use MLX in the browser is through a local server (see e.g. https://github.com/qnguyen3/chat-with-mlx/). That is all running locally. You can also use native apps with e.g. MLX Swift. I expect more to be built over the coming weeks.
To use MLX built models with the ONNX web runtime, we would need a path to export to ONNX. That is definitely a possibility, it's not the top priority but something we'd like to get to when we can.
IIRC Metal is a supported runtime for webgpu on chrome / firefox in their webgpu implementations (chrome uses dawn, firefox uses wgpu). But these are abstracted away through the webgpu api, so yes you are running on RTX / Metal but you won't have any real control over the gpu in the same way. They even have their own language for compute shaders (WGSL)
I'm going to close this as somewhat out of scope of MLX. Would be great to continue to the discussion about ways to export out of MLX, but probably a discussion is a better place for that.
🔥 ONNX recently released in-browser WebGPU support. The demo shows support for NVIDIA RTX and Intel CPU. However, the blog also points out that WebGPU support is available for MacBooks.
👉 Source: ONNX Runtime Web unleashes generative AI in the browser using WebGPU
This has tremendous potential of multiplying the effects of having a good locally running in-browser LLM use case. This could also mean more demand for higher configuration laptops -- possibly offering an incentive for laptops manufacturers (including Apple).
I would suggest that we should explore this option and see if mlx can elevate such in-browser local LLM experience.
I am setting this up as a place to discuss this option. Would love to hear thoughts, concerns, ideas and advice.
Quoting from the blogpost: