Adjust chat completion request message order

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Apache License 2.0

659 stars 113 forks source link

Adjust chat completion request message order #72

Closed ohm314 closed 1 month ago

ohm314 commented 2 months ago

I noticed when attempting to benchmark google/gemma-2-9b-it on vllm that the order of the "user" and "system" roles in the message object is relevant, and starting with "system" causes the inference server to reject the request with HTTP 400.

This small patch simply reverses the order, which should generally make sense for all other use cases. I've also successfully tested my patch with Gemma 2 and Llama 3.1 model on vllm.

ohm314 commented 1 month ago

Nevermind - I understood now that reversing the order makes no sense here - instead the system message should probably be suppressed for the gemma models.