microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.6k stars 274 forks source link

Details for GPT4 evaluation #145

Open jongwooko opened 9 months ago

jongwooko commented 9 months ago

Hi. Can I ask about the query of GPT-4 evaluation in detail?

I tried to

""" We would like to request your feedback on the performance of two AI assistants in response to the user instruction and input displayed above. Please rate the helpfulness, relevance, accuracy, and level of detail of their responses. Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance. Please first output a single line containing only two values indicating the scores for Assistant 1 and 2, respectively. The two scores are separated by a space. In the subsequent line, please provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

Determine the sentiment of the input sentence. Please respond as positive or negative.

Input:

{sentence}

Assistant 1:

{output}

Assistant 2:

{ground truth} """

with my query, the GPT-4 evaluation is 10 larger than you reported. can you share the detailed query and GPT-4 API model name for your experiments? Thank you.