✨ vLLM Backend integration - Githubissues

neuralmagic / guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Apache License 2.0

159 stars 11 forks source link

✨ vLLM Backend integration #42

Open parfeniukink opened 2 months ago

parfeniukink commented 2 months ago

Summary

This PR extends the PR: Deepsparse Backend implementation. The base branch is parfeniukink/features/deepsparse-backend.

vllm is added to optional dependencies
The VllmBackend class encapsulates the vLLM integration.
The guidellm/backend/vllm is available only if the Python version and the runtime platform pass the validation.
vllm tests are skipped in case the platform is not Linux

Usage

This is an example of a command you can use in your terminal:

--data=openai_humaneval: determines the dataset
--model=/local/path/my_model: determines the local path to the model object. If not specified - the env variable will be used.
```
python -m src.guidellm.main --data=openai_humaneval --max-requests=1 --max-seconds=20 --rate-type=constant --rate=1.0 --backend=vllm --model=/local-path
```
Environment configuration

The model could also be set with GUIDELLM__LLM_MODEL. If the CLI value or environment variable is not set, then the default will be used. Currently, the default model is: mistralai/Mistral-7B-Instruct-v0.3.