huan9huan / prompts

to be a professional prompt engineer
2 stars 0 forks source link

evaluation: 多维度来评价某输出 #42

Open huan9huan opened 1 year ago

huan9huan commented 1 year ago

Andrew NG的课程中, 评价无标准的输出的方法, 使用多维度

https://learn.deeplearning.ai/chatgpt-building-system/lesson/10/evaluation-part-ii https://www.youtube.com/watch?v=fJ1PemlSVtY&list=PLiuLMb-dLdWKjX8ib9PhlCIx1jKMNxMpy&index=10 其中, 提到让llm多维度的评价你的结果

def eval_with_rubric(test_set, assistant_answer):

    cust_msg = test_set['customer_msg']
    context = test_set['context']
    completion = assistant_answer

    system_message = """\
    You are an assistant that evaluates how well the customer service agent \
    answers a user question by looking at the context that the customer service \
    agent is using to generate its response. 
    """

    user_message = f"""\
You are evaluating a submitted answer to a question based on the context \
that the agent uses to answer the question.
Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Context]: {context}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

Compare the factual content of the submitted answer with the context. \
Ignore any differences in style, grammar, or punctuation.
Answer the following questions:
    - Is the Assistant response based only on the context provided? (Y or N)
    - Does the answer include information that is not provided in the context? (Y or N)
    - Is there any disagreement between the response and the context? (Y or N)
    - Count how many questions the user asked. (output a number)
    - For each question that the user asked, is there a corresponding answer to it?
      Question 1: (Y or N)
      Question 2: (Y or N)
      ...
      Question N: (Y or N)
    - Of the number of questions asked, how many of these questions were addressed by the answer? (output a number)
"""

    messages = [
        {'role': 'system', 'content': system_message},
        {'role': 'user', 'content': user_message}
    ]

    response = get_completion_from_messages(messages)
    return response