Open pathquester opened 7 months ago
Could you share more context on what do you mean when you say distributed? At the moment what LLM.js does is, it uses the WebAssembly VM running on the browser to run the Model inference on the single model. I'm not 100% sure if the compilation does any multi-threading/processing optimization on WASM level but there are programming instructs with SIMD which can be explored to perform parallel computation. This might require some re-writes on the project level.
What would it take for the project to add support for distributed inference?