Silent failure by promptRubric not imputing variables or ignoring `value`

strentom commented 3 days ago

Describe the bug The promptRubric is ignored or the imputed parameters are not inserted. I can't tell because the logs (with --verbose) are insufficient.

To reduce likelihood of my error, I searched for existing issues and copied promptRubric from an issue that is resolved (#823). Doing eval on the same YAML (only difference is provider), it silently fails (doesn't produce expected result) and from LLM output I hypothesize that the LLM didn't receive the full prompt. This can be reproduced even with simpler prompts, but I wanted to be sure that the YAML is 100% correct.

To Reproduce Take this YAML as from #823 (modified only the provider):

prompts: >-
  [{
    "role": "system",
    "content": "Translate text into {{ language }}."
  }, {
    "role": "user",
    "content": "{{ input }}"
  }]
providers: vertex:gemini-1.5-flash
defaultTest:
  options:
    rubricPrompt:
      - role: system
        content: >-
          Evaluate the quality of the translation provided by an AI assistant to the user input displayed below.
          Score the response on a scale of 0 to 10.
          Output your response in the following JSON format: {pass: bool, score: number, reason: string}
      - role: user
        content: >-
          [Input Start]
          {{ input }}
          [Input End]

          [Translation Start]
          {{ output }}
          [Translation End]

tests:
  - vars:
      input: "Happy families are all alike. \nAnd every unhappy family is unhappy in its own way."
      language: "Chinese"
    assert:
      - type: llm-rubric

and run promptfoo eval on it.

Expected behavior The promptRubric should rate the translation quality. Instead, it responds: {"pass":false,"reason":"No rubric was provided","score":0,"tokensUsed":{"total":201,"prompt":179,"completion":22,"cached":0}}

Screenshots

Reading prompts from ["[{\n  \"role\": \"system\",\n  \"content\": \"Translate text into {{ language }}.\"\n}, {\n  \"role\": \"user\",\n  \"content\": \"{{ input }}\"\n}]"]
Inserting prompt 3675bb3253267004907c89fcb4eaa51cbc0f6e2d963b914b11fa7e1497969feb
Inserting dataset 14998238815e9383d52ac592abda36278c157fa1fe9ce420837809599f62d532
Coerced JSON prompt to Gemini format: [{"role":"system","content":"Translate text into Chinese."},{"role":"user","content":"Happy families are all alike. \nAnd every unhappy family is unhappy in its own way."}]
Preparing to call Google Vertex API (Gemini) with body: {"contents":{"role":"user","parts":{"text":"Translate text into Chinese.Happy families are all alike. \nAnd every unhappy family is unhappy in its own way."}},"generationConfig":{}}
Gemini API response: [{"candidates":[{"content":{"role":"model","parts":[{"text":"幸福"}]}}],"modelVersion":"gemini-1.5-flash"},{"candidates":[{"content":{"role":"model","parts":[{"text":"的家庭都一样。\n不幸的家庭各有各的不幸。 \n"}]},"safetyRatings":[{"caY_HATE_SPEECH","probability":"NEGLIGIBLE","probabilityScore":0.17480469,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.083984375},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE","probabilityScore":0.040283203,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.02368164},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE","probabilityScore":0.25585938,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.111328125},{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE","probabilityScore":0.087402344,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.04272461}]}],"modelVersion":"gemini-1.5-flash"},{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"avgLogprobs":"NaN"}],"modelVersion":"gemini-1.5-flash"},{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":24,"candidatesTokenCount":17,"totalTokenCount":41},"modelVersion":"gemini-1.5-flash"}]
Gemini API response: [{"candidates":[{"content":{"role":"model","parts":[{"text":"幸福"}]}}],"modelVersion":"gemini-1.5-flash"},{"candidates":[{"content":{"role":"model","parts":[{"text":"的家庭都一样。\n不幸的家庭各有各的不幸。 \n"}]},"safetyRatings":[{"caY_HATE_SPEECH","probability":"NEGLIGIBLE","probabilityScore":0.17480469,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.083984375},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE","probabilityScore":0.040283203,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.02368164},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE","probabilityScore":0.25585938,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.111328125},{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE","probabilityScore":0.087402344,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.04272461}]}],"modelVersion":"gemini-1.5-flash"},{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"avgLogprobs":"NaN"}],"modelVersion":"gemini-1.5-flash"},{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":24,"candidatesTokenCount":17,"totalTokenCount":41},"modelVersion":"gemini-1.5-flash"}]
Performing remote grading: {"task":"llm-rubric","rubric":"","output":"幸福的家庭都一样。\n不幸的家庭各有各的不幸。 \n","vars":{"input":"Happy families are all alike. \nAnd every unhappy family is unhappy in its own way.","language":"Chinese"}}
Got remote grading result: {"pass":false,"reason":"No rubric was provided","score":0,"tokensUsed":{"total":201,"prompt":179,"completion":22,"cached":0}}

System information:

Promptfoo version: 0.92.2

typpo commented 8 hours ago

Hi @strentom, definitely a bug. Just clarifying, are you running into this error with a redteam config? Based on the logs, you are hitting a redteam codepath which is different from the example provided and different from the behavior when I run it locally.

strentom commented 8 hours ago

Hi @typpo . I’m running ‘promptfoo eval -c script_above.yaml —verbose’ . I have other files and configs in the folder (incl. redteam related) but these should be ignored.

typpo commented 7 hours ago

Got it, this is helpful thanks. #1877 should fix the immediate issue. There is another issue of your redteam config being picked up when it shouldn't.

promptfoo / promptfoo

Silent failure by promptRubric not imputing variables or ignoring `value` #1866