ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
444 stars 23 forks source link

[Idea] Use something better than memfs #35

Closed ngxson closed 6 months ago

ngxson commented 6 months ago

MemFS implementation uses std::vector which is quite memory-intensive:

https://github.com/emscripten-core/emscripten/blob/799a1cb35b3c6065ba8b2e519e589944c0057f6d/system/lib/wasmfs/backends/memory_backend.cpp#L16-L27

A better way to load model into llama.cpp is to pass the buffer directly from JS to llama.cpp. This likely requires modifying code inside llama.cpp and ggml

ngxson commented 6 months ago

Another idea would be patching mmap function, so that it no longer copy the memory:

https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/src/library_memfs.js#L325-L355

TODO: also have a look on wasmfs implementation (although it's unusable for now, because it does not support MAP_SHARED): https://github.com/emscripten-core/emscripten/blob/2bc5e3156f07e603bc4f3580cf84c038ea99b2df/system/lib/libc/emscripten_mmap.c#L105

ngxson commented 6 months ago

Seems like heapfs is the best that we can do.

Another idea (more native implementation) is to mmap directly to file-on-disk, but clearly this is not supported by browser (clearly, this is risk for security)