Open guoshengCS opened 3 months ago
your used prompt template has been deprecated, please try configs/datasets/mbpp/mbpp_gen_830460.py
configs/datasets/mbpp/mbpp_gen_830460.py
Thanks for the quick reply! @tonysy
It seems has the same problem since the input prompt ends with [BEGIN]
https://github.com/open-compass/opencompass/blob/main/configs/datasets/mbpp/mbpp_gen_830460.py#L23 , thus the response would not start with it, while MBPPEvaluator
only extract [BEGIN]
started anwser.
Got it, I think the prompt is designed for base model and we may need to upgrade the prompt compatible with instruct model.
Got it, I think the prompt is designed for base model and we may need to upgrade the prompt compatible with instruct model.
hello, has this bug been fixed?
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
torch2.2.0+vllm-0.4.0
Reproduces the problem - code/configuration sample
evaluate mbpp + qwen2-72b-vllm with following config
Reproduces the problem - command or script
evaluate mbpp + qwen2-72b-vllm with following config
Reproduces the problem - error message
Unexpected mbpp score compared with https://qwenlm.github.io/blog/qwen2/
Other information
A prediction case use qwen2-72b of mbpp is as following:
As we can see, the
prediction
does not start with[BEGIN]
which is the ending string of input prompt by https://github.com/open-compass/opencompass/blob/main/configs/datasets/mbpp/mbpp_gen_830460.py#L23However,
MBPPEvaluator
extract answers with patterns starting with[BEGIN]
which get the non-first program among multiple program cases given by the base LLM model