runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
237 stars 96 forks source link

Lora-module support needed for using adapters #119

Open sven-knoblauch opened 3 weeks ago

sven-knoblauch commented 3 weeks ago

When trying to configure a lora adapter, the ENV vars for enabling lora and other settings are exposed (also in runpod UI), but there is no option for adding the actual lora-modules (paths to the lora adapters/huggingface link).

In src/engine.py in the class OpenAIvLLMEngine is the option for adding these lists (line 137, 145).

As far as i saw in the vllm github page, the list should be like that: lora_modules: Optional[List[LoRAModulePath]]

class LoRAModulePath:
    name: str
    path: str
    base_model_name: Optional[str] = None

Without these lora-modules, all other lora settings seem to be useless.

dumbPy commented 3 weeks ago

for this to work, the vllm docker needs --lora-modules name1=/path/to/adapter1 name2=hfuser/adapter2.

So you see, the adapter path either can be a local path or huggingface model.

since there's no way to mount files, a simple way would be to allow adding adapters from huggingface like name2=hfuser/adapter2 that vllm docker then downloads form huggingface automatically. When the inference requests contains model=name3, the vllm openai docker downloads the corresponding lora adapter from huggingface and loads it.

sven-knoblauch commented 2 weeks ago

yeah, for the "standard" vllm docker image you can load it with --lora-modules, but how do you do so in the runpod serverless vllm worker image. There you can't add this to the CMD of the docker, only thing i can do is add Environement variables, and there is no option to add the lora-modules. That's as far as i understand it.