mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.03k stars 1.56k forks source link

Missing instructions on installing additional models #19

Closed execveat closed 1 year ago

execveat commented 1 year ago

Hey there, congratulations on a great release! The app works great on a Mac and the installation was very straightforward.

Do you have plans for growing the mlc_chat_cli into a standalone tool or is it meant to be a proof of concept? Readme claims the project can be used to run 'any language model', but there are no instructions for how to do it. Furthermore, code seems to indicate that only three models are supported right now, is that right?

Unless the mlc_chat_cli is supposed to be a toy demo, could you please add instructions for:

  1. which models are supported (e.g. would RNN based models like https://github.com/BlinkDL/RWKV-LM work or is it just transformers)?
  2. which formats, quantization methods and directory structures are supported - i.e. I don't think grabbing a random link from HF and cloning it the same way Vicuna was installed during original installation (git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b) would work, right?
  3. it seems that there is a template/profile system for different LLM families, how do we add additional templates? Does it require patch/pull-request or can it be done by tweaking a config file somewhere?
  4. the Readme mentions multiple optimizations, but the mlc_chat_cli doesn't expose that info/settings to the user. How do we tweak those?
  5. given that the claim in the Readme is that all language models are supported, there should be some kind of rough guide on how to calculate the hardware requirements (e.g. what LLMs can my machine run using this tool, with what quantization and performance?) as a comparison, llama.cpp Readme isn't well-structured, but does provide a good overview of RAM requirements for a given model size and impact of different quantization techniques on performance

Also, it would be very neat if you mentioned in the Readme, what kind of community interactions are you aiming for. Would you prefer that people build their own tools that use mlc-llm as a backend or send PRs for improving mlc_chat_cli?

tqchen commented 1 year ago

Thanks you for your input. Indeed there are a lot of things we can improve. This is the beginning of the release so indeed there are a lot of things can be added on top. We will release followup materials on guides and local builds.

There are two components of the project.

One thing to mention is that the overall MLC flow is in python and highly customizable. For example, we could easily add 3bit int, or new formats like 4bit floating points to the python flow that may or may not sit in this repo. It took us about the order of say a few days to explore a few different quantization format and use ML compilation optimize and generate high performing code.

And yes, as an open source community, we love contributions and pull requests.

execveat commented 1 year ago

Thank you for the explanation! I see that there is support for more models in mlc_llm/conversation.py, but the list in cpp/cli_main.cc is more limited. I guess this is just work in progress?

I would greatly suggest option to override profile selection via command line argument instead of always taking it from the path name. And moving profile / template definitions into an user-editable config file would be amazing as well (e.g. to customize the prompt and temperature).

elbowdonkey commented 1 year ago

Are there instructions for how to convert existing models to be used with mlc-llm? Reading this current thread, and this one, it seems possible, but I've not found any hints as to how to start.

tqchen commented 1 year ago

Thank you for your suggestion, we will work on the instructions in the incoming weeks. The current build.py pipeline should support the llama class and there is WIP on other classes of models

yx-chan131 commented 1 year ago

Hi, is this project mainly working on LLMs? I wonder if the MLC flow works for image generation models (e.g., Stable Diffusion).

tqchen commented 1 year ago

@yx-chan131 yes, checkout https://github.com/mlc-ai/web-stable-diffusion

Poordeveloper commented 1 year ago

Looking forward to the instruction, I am waiting for integrating it in my chat bot. https://github.com/Poordeveloper/chatgpt-app

gamingflexer commented 1 year ago

Thank you for your suggestion, we will work on the instructions in the incoming weeks. The current build.py pipeline should support the llama class and there is WIP on other classes of models

Yup any start would be fine! looking forward to this

(Let me know if you need any help! Worked with LLM in production on for big GPU's )

CharlieFRuan commented 1 year ago

Closing this issue for now due to inactivity. Feel free to reopen or open another issue if there are other questions!