I noticed when attempting to benchmark google/gemma-2-9b-it on vllm that the order of the "user" and "system" roles in the message object is relevant, and starting with "system" causes the inference server to reject the request with HTTP 400.
This small patch simply reverses the order, which should generally make sense for all other use cases. I've also successfully tested my patch with Gemma 2 and Llama 3.1 model on vllm.
Nevermind - I understood now that reversing the order makes no sense here - instead the system message should probably be suppressed for the gemma models.
I noticed when attempting to benchmark google/gemma-2-9b-it on vllm that the order of the "user" and "system" roles in the message object is relevant, and starting with "system" causes the inference server to reject the request with HTTP 400.
This small patch simply reverses the order, which should generally make sense for all other use cases. I've also successfully tested my patch with Gemma 2 and Llama 3.1 model on vllm.