lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.97k stars 4.56k forks source link

Multi VLLM Worker #2040

Open keeganmccallum opened 1 year ago

keeganmccallum commented 1 year ago

Now that vllm supports llama2 properly, would be awesome to have a peft weight-sharing worker like for regular model serving. The comment at the end of this issue is probably enough to spike one:

https://github.com/vllm-project/vllm/issues/182

merrymercy commented 1 year ago

Please help us by contributing PRs. We do not have the bandwidth to work on this now.

fozziethebeat commented 1 year ago

That comment doesn't seem to be fully baked into vLLM yet. Having been the one that wrote the multi model worker, I would say FastChat should wait until vLLM has native support for loading PeftModels and then add a multi_vllm_model_worker that leverages their support.

To make this happen faster, the best step is probably encourage vLLM to implement peft models (ideally in a way that's more flexible than Peft does it)