prometheus-eval / prometheus

[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
https://prometheus-eval.github.io/prometheus/
MIT License
287 stars 17 forks source link

Unable to generate evaluation #12

Closed HuihuiChyan closed 10 months ago

HuihuiChyan commented 10 months ago

Hi I am using the 7B version's model. I follow your code and feed the following content to the model:

[INST] <<SYS>>
You are a fair evaluator language model.
<</SYS>>

###Task Description:
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\"
4. Please do not generate any other opening, closing, and explanations.

###The instruction to evaluate:
{question_body}

###Response to evaluate:
{answer_body}

###Reference Answer (Score 5):
{reference}

###Score Rubrics:
{rubric}

###Feedback: [/INST]

However, the output is either garbled message (if I set temperature as 1.0) or empty string (if I set temperature as 0.0). Can you give me some suggestions? Many thanks in advance!

SeungoneKim commented 10 months ago

Hey @HuihuiChyan , thanks for your interest in our work!

Could you check if you set the max_token_length to something like 256 or 512? For the exact hyper-parameters we used for inference, please check Appendix C (page 22) in our paper.

If this doesn't work, I could take a look if you share me the exact input you gave to the model and what it generated.

HuihuiChyan commented 10 months ago

Sure! Here is one of my input:

[INST] <<SYS>>\nYou are a fair evaluator language model.\n<</SYS>>\n\n###Task Description:\nAn instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.\n1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.\n3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"\n4. Please do not generate any other opening, closing, and explanations.\n\n###The instruction to evaluate:\nStruggling with the recent loss of a beloved pet, a person reaches out for comfort and advice on how to cope with the grief.\n\n###Response to evaluate:\nI\'m so sorry to hear about the loss of your beloved pet. It\'s understandable to feel immense sadness, as pets become essential family members, offering us endless love. Their absence creates an emptiness that feels hard to mend. It\'s critical to let yourself grieve and process the emotions that arise. Each person\'s grieving process is unique, and there\'s no right or wrong way to handle it. One therapeutic method could be to remember your pet in your special way, perhaps by making a keepsake box or penning down a letter. Taking care of yourself physically and emotionally is also vital during this time. When you feel ready, seeking out support from those who can empathize with your loss can be immensely beneficial. As time passes, your pain will lessen, and your heart will be filled with beautiful memories of your pet. Always remember, it\'s okay to grieve and miss your beloved companion.\n\n###Reference Answer (Score 5):\nIt\'s clear that the loss of your cherished pet has caused you a great deal of pain, and it\'s completely natural to feel such deep grief. Pets are not merely animals; they become integral parts of our families, providing us with unconditional love and companionship. Their loss can leave a void that feels impossible to fill.\n\nFirstly, allow yourself to grieve. Losing a pet can be as devastating as losing a human loved one, and it\'s important to allow yourself to feel these emotions fully. There\'s no right or wrong way to grieve, and everyone\'s process is unique. If tears come, let them flow. It\'s a natural part of the healing process.\n\nSecondly, remember your pet in a way that feels right to you. You might want to create a memory box with photos, a favorite toy, or even a collar. Others might find comfort in writing a letter to their pet, expressing all the things they wish they could have said. This can be a cathartic way to express your feelings and to say goodbye.\n\nNext, take care of yourself. Grief can be physically exhausting. Try to maintain a regular eating and sleeping schedule, and give yourself permission to rest when you need it. Engaging in physical activity, such as walks or yoga, can also be a helpful way to manage stress and pent-up emotions.\n\nLastly, when you\'re ready, consider seeking support from others who understand what you\'re going through. This could be a pet loss support group, a trusted friend who has experienced a similar loss, or a mental health professional. Sharing your feelings with others can often lighten the burden of grief.\n\nRemember, it\'s okay to grieve for as long as you need to, and it\'s okay to miss your pet. In time, the pain will lessen, and you\'ll be left with the precious memories of your beloved pet. Your pet\'s pawprints are forever imprinted on your heart, and they\'ll always be a part of you.\n\n###Score Rubrics:\n[Does the response demonstrate understanding, empathy, and sensitivity to the emotions, perspectives, or concerns of the user or topic at hand?]\nScore 1: The response is dismissive, insensitive, or entirely misses the emotional context.\nScore 2: The response shows limited empathy or understanding but might come off as generic or detached.\nScore 3: The response demonstrates a basic level of empathy and understanding but might lack depth or genuine connection.\nScore 4: The response is empathetic and sensitive, with minor areas for improvement in depth or connection.\nScore 5: The response is deeply empathetic, sensitive, and demonstrates a genuine understanding of the user\'s emotions or the topic\'s emotional context.\n\n###Feedback: [/INST]

And my code looks like this:

    model = vllm.LLM(model=model_path, tensor_parallel_size=1)
    sampling_params = vllm.SamplingParams(
        temperature=0.0,
        max_tokens=1024,
        top_p=1.0,
    )
    pred_list = model.generate(prompts, sampling_params)
    pred_list = [it.outputs[0].text for it in pred_list]

However, the outputs are all empty strings. I am using the 7b version model. Any suggestions? Many thanks in advance!

HuihuiChyan commented 10 months ago

Sorry, I found that I did no download the full checpoints! My mistake! (But it is weird that vllm did not give me any notice). Thanks again. Closing the issue now.

SeungoneKim commented 10 months ago

@HuihuiChyan Glad to know it's working:)