Closed JamesKunstle closed 4 months ago
Yeah I looked at the tokenizer_config.json for granite-7b-lab
and prometheus-8x7b-V1.0
, and they're using different prompt formats. We need to handle this for gen_answers.
There's an environment variable switch that will merge the system message with the first user message: https://github.com/instructlab/eval/blob/6c537c5e2f71f364366086d6eb10de5a74ede2da/src/instructlab/eval/mt_bench_common.py#L227
I'll try this. If this works, we should parameterize this in the evaluator object.
@alinaryan
Thanks for digging into this @JamesKunstle. I'm going to add an option to have this passed from evaluate in the cli down to this level and remove the env var entirely.
@danmcp fantastic thank you. This is the fix btw, prometheus generated + judged successfully.
vLLM emits the error:
ERROR 07-05 19:50:10 serving_chat.py:225] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/... INFO: ::1:39498 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
I think that the major problem is that granite uses a system/user/assistant patter rather than /user/assistant only.