This PR adds support for benchmarking multimodal models.
It mostly extends existing infrastructure to add support to requests containing images. For emulated requests it downloads images from an illustrated version from Pride and Prejudice and randomly selects from them.
The load_images logic is currently limited to download from url. It should be extended to HF datasets or local files in the future.
This PR adds support for benchmarking multimodal models.
It mostly extends existing infrastructure to add support to requests containing images. For emulated requests it downloads images from an illustrated version from Pride and Prejudice and randomly selects from them.
The load_images logic is currently limited to download from url. It should be extended to HF datasets or local files in the future.
I tested by running the following command:
guidellm --data="prompt_tokens=128,generated_tokens=128,images=1" --data-type emulated --model microsoft/Phi-3.5-vision-instruct --target "http://localhost:8000/v1" --max-seconds 20
On 2xA5000 I had to set max_concurrenty=4 to run this command due to memory limitations.