meta-llama / llama3

The official Meta Llama 3 GitHub site
Other
23k stars 2.44k forks source link

Openapi style api document #71

Open xiaoToby opened 2 months ago

xiaoToby commented 2 months ago

I am very urgently want to use LLama3 in this way (https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/tree/main/scripts/openai_server_demo)

My questions is: Can I use LLama3 in the same file, just change download the models and change the model name on the file ?

@jspisak @astonzhang @gitkwr @ruanslv @HamidShojanazeri

ejsd1989 commented 2 months ago

@xiaoToby looking at that repo, it looks like it's just using fastapi. I think you may have some luck if you give it a shot, but my biggest concern is that you may run into some rambling with INSTRUCT model if you don't manually account for the prompt template changes that we note here in llama-recipes.

If you're looking for a quick spin up server, then another option is using the latest vLLM, which is working with Llama 3 already. This will give you an quick way to spin up a server and then you can easily use curl to hit it.

xiaoToby commented 2 months ago

@xiaoToby looking at that repo, it looks like it's just using fastapi. I think you may have some luck if you give it a shot, but my biggest concern is that you may run into some rambling with INSTRUCT model if you don't manually account for the prompt template changes that we note here in llama-recipes.

If you're looking for a quick spin up server, then another option is using the latest vLLM, which is working with Llama 3 already. This will give you an quick way to spin up a server and then you can easily use curl to hit it.

thanks for your help,it helps a lot! But there is an obstacle, the requirements of CUDA's version and Python's version for vllm are strict. So it's not very convenient to use.

jspisak commented 1 month ago

@WoosukKwon - can you help with this question on vLLM?

WoosukKwon commented 1 month ago

Hi @jspisak, thanks for letting me know the issue!

@xiaoToby Which CUDA and Python versions are you using? You can simply install vLLM by running pip install vllm. It will work for Python 3.8 - 3.11, which is the same as the Python versions supported by PyTorch. As for the CUDA version, the pypi wheels use CUDA 12.1 and can run on machine with NVIDIA driver >= 530.30.02 (you don't need to install CUDA SDK). Also, we provide CUDA 11.8 wheels in our release.

xiaoToby commented 1 month ago

Hi @jspisak, thanks for letting me know the issue!

@xiaoToby Which CUDA and Python versions are you using? You can simply install vLLM by running pip install vllm. It will work for Python 3.8 - 3.11, which is the same as the Python versions supported by PyTorch. As for the CUDA version, the pypi wheels use CUDA 12.1 and can run on machine with NVIDIA driver >= 530.30.02 (you don't need to install CUDA SDK). Also, we provide CUDA 11.8 wheels in our release.

I tried to use vLLM on docker image (nvidia/cuda:12.2.2-devel-ubuntu22.04), it works fine. image