neuralmagic / guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Apache License 2.0
158 stars 11 forks source link

Multimodal #66

Open anmarques opened 1 day ago

anmarques commented 1 day ago

This PR adds support for benchmarking multimodal models.

It mostly extends existing infrastructure to add support to requests containing images. For emulated requests it downloads images from an illustrated version from Pride and Prejudice and randomly selects from them.

The load_images logic is currently limited to download from url. It should be extended to HF datasets or local files in the future.

I tested by running the following command:

guidellm --data="prompt_tokens=128,generated_tokens=128,images=1" --data-type emulated --model microsoft/Phi-3.5-vision-instruct --target "http://localhost:8000/v1" --max-seconds 20

On 2xA5000 I had to set max_concurrenty=4 to run this command due to memory limitations.