vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.58k stars 3.37k forks source link

[Usage]: Load local model from local path #6012

Open xiaoyu-work opened 3 weeks ago

xiaoyu-work commented 3 weeks ago

How would you like to use vllm

Does vLLM support huggingface local pytorch pt model or onnx model? How can I load them in both offline python code and OpenAI Completions API? I can see an error saying config.json file was not found. Does vllm only support vanilla hf model?

DarkLight1337 commented 3 weeks ago

vLLM supports local models, but the file structure should follow that of a standard HuggingFace model repo.

xiaoyu-work commented 3 weeks ago

Do you know how can I convert pt model or onnx model to huggingface model format? Seems I need to register it to huggingface first: https://discuss.huggingface.co/t/convert-pytorch-model-to-huggingface-transformer/16965