mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
13.13k stars 836 forks source link

Separate Non-UI Logic in llm-chat.js into a npm Package #94

Closed Ryan-yang125 closed 1 year ago

Ryan-yang125 commented 1 year ago

Can the non-UI logic in llm-chat.js, such as Class LLMChatInstance, Class LLMChatPipeline, and related dependencies like tvmjs.bundle.js and tvmjs_runtime.wasi.js, be separated and packaged into an npm package? This way, developers can simply install the package via npm and use a few lines of code to access web-llm's capabilities on their website without having to understand the underlying logic. This would be also helpful for maintaining the web-llm demo on the official website and adding new features such as support for more models.

tqchen commented 1 year ago

Thanks @Ryan-yang125 I think this is something that we can working towards.

In general we can refactor the chat to be closer to that component. Logically this would makes a lot of sense.

Would be great to get suggestions on how packaging can be done (maybe via some rollup mechanism?). also cc @DustinBrett @r2d4 who might have relevant experiences.

Ryan-yang125 commented 1 year ago

Thanks reply @tqchen

When I made ChatLLM-web, I copied all the files in the branch gh-pages: tokenizer.model, vicuna-7b-v1_webgpu.wasm, sentencepiece.js, tvmjs_runtime.wasi.js, tvmjs.bundle.js and llm-chat.js, and rewrite the ui updating logic in llm-chat.js like functions updateLastMessage, appendMessage etc.

This is tedious and unnecessary.If we can package the dependencies and export some functions, we can use web-llm like:

import  { LLM } from 'web-llm;
const llm = new LLM( { model: 'vicuna-7b'} );
const pipeline = await llm.pipeline();
const out = await pipeline('hi, who are u')

I think it will be helpful both for web-llm maintainers and developers who want to intergrate web-llm to their own project.

tqchen commented 1 year ago

Thanks @Ryan-yang125 certainly get the point. We are in the process of first factorizing out the tokenizer and model also into artifact standard URLs.

The main question is about how to packaging the rest of the modules, which we would be happy to get some suggestions

tqchen commented 1 year ago

Here is a rough sketch about what we can do.

Hopefully we will factor out

The remaining model artifact and wasm will become pluggable and can be configured through async init of the ChatModule.

Personally I have limited knowledge in js packaging so would also love feedbacks here.

idosal commented 1 year ago

I'm actually working on several projects that use web-llm off-shoots (e.g., https://github.com/idosal/AgentLLM) and have some packaging methods that provide a similar API to @Ryan-yang125 's description. Once I fine-tune it I'll be happy to contribute it. I'll be happy to hear any thoughts on what would be a good API.

r2d4 commented 1 year ago

Was able to split out the tvmjs bundle fairly easily. Moving this into it's own package today in the react-llm monorepo, but happy to move it upstream or use whatever you publish.

Had some trouble with the tvm runtime WASI piece -- took some work to figure out what the right commits and branches were (the ndarray_cache functions were split out at some point?). There's a few emscripten flags that help a bit, but some of the js runtime might need to be reworked a bit.

As for the sentencepiece-js, seems like that's already been figured out, but spent a few hours on that as well.

For what it's worth, I'm not much of an expert at JS packaging either, so take some of the things I've done with a grain of salt :)

Will share some more thoughts soon!

DustinBrett commented 1 year ago

I've mostly been hacking the files as I go tbh. I share the tvm files between Web-LLM & Web-SD with almost no modification (https://github.com/DustinBrett/daedalOS/tree/main/public/System/tvm). For things like sentencepiece I tweaked it a bit to work in a non-module way which was easier for me to bring into a web worker. I recently updated my files to work with the new files that also support WizardLM (https://github.com/DustinBrett/daedalOS/tree/main/public/Program%20Files/WebLLM).

Without a CUDA card I haven't been able to do much building from source, so I just chop up the latest files from the GitHub Actions artifacts (https://github.com/mlc-ai/web-llm/actions/runs/4996705338). It would indeed be more accessible if it was more modular and not connected to the demo UI elements.

r2d4 commented 1 year ago

I separated out most of the library into @react-llm/model which I published to npm https://www.npmjs.com/package/@react-llm/model and it currently lives in the react-llm monorepo https://github.com/r2d4/react-llm/tree/main/packages/model. Still fetches sentencepiece and the tvm WASI runtime, but a start.

tqchen commented 1 year ago

https://github.com/mlc-ai/web-llm/pull/113 This now lands, checkout the latest instructions in https://github.com/mlc-ai/web-llm#get-started

Additionally, we would love to empower the community members to build fun things on top, https://github.com/mlc-ai/web-llm/tree/main/examples now contains an initial page that we would love to populate further on awesome projects that uses WebLLM, please help send PRs.

Finally, as we modularizes the component, we would love to also get community's help on improving some of the models, especially the webapp components, and provide more examples to the community.

Thanks everyone for chiming in here

DustinBrett commented 1 year ago

Awesome, thanks for doing this! I have gone ahead and converted my code to use this new module if anyone wants to see an example.

https://github.com/DustinBrett/daedalOS/commit/2cd3676c0ef0f4b25e58ed99ee38b9832083898b