Closed felladrin closed 6 months ago
I suspect that the OOM error may be resolved if we split the model into smaller chunks before loading it. There's an updated section in README mentioning that:
However, I'm not 100% sure if this resolve the issue. Would you mind to test it out? Thank you.
There a test model in advanced example:
// Wrong thread
Hey, folks, I have good news!
After reading the following issue, I decided to tweak the max wasm memory and confirmed it was the root of the problem:
So, first a bit more context:
So, I decreased the maximum
property in the WebAssembly.Memory
instantiation.
-wasmMemory=new WebAssembly.Memory({"initial":INITIAL_MEMORY/65536,"maximum":4294967296/65536,"shared":true
+wasmMemory=new WebAssembly.Memory({"initial":INITIAL_MEMORY/65536,"maximum":1288490189/65536,"shared":true
It was decreased from 4GB to 1.2GB (20% of the 6GB of the device). Since then I haven't faced the out-of-memory problem when running multi-threaded Wllama anymore 🎉 . TinyLlama 1.1B is running fast in the mobile browser!
So, we need to find a way to make this Emscripten MAXIMUM_MEMORY
configurable (it's currently set to 4GB
). Maybe we can do something like we discussed here.
@felladrin Thanks for the info. Yeah it would be quite annoying to check if we're running on iOS, then set the appropriate max memory.
Another idea would be to write a loop to try multiple values until it successes.
I'm experiencing an Out Of Memory error when attempting to run Wllama with multi-threads on an iOS browser.
It occurs regardless of the model size, although
navigator.hardwareConcurrency
is3
on this browser.For instance, I can run TinyLlama 1.1B (Q3_K) with single-thread, but even a Llama 68M model fails when I enable multi-thread.
So this problem appears to be related to Wasm or the worker script rather than the models themselves.
To address this issue, I'm using a try-catch: When it throws Out Of Memory, I reinitialize Wllama with
{ "n_threads": 1 }
.Is anyone else facing this issue?