Closed wpietri closed 6 months ago
I get this pretty frequently running coffee benchmark
. I agree a SUT shouldn't return an empty string, but I think it's also true that an annotator shouldn't crash when that happens. Here's a fresh stack trace:
Examining system "Meta Llama 2, 7b parameters"
Starting run for benchmark "General Chat Bot"
Examining harm "Toxicity"
[...]
File "/home/william/projects/mlcommons/coffee/src/coffee/run.py", line 77, in benchmark
results[test] = run_prompt_response_test(
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/runners/simple_benchmark_runner.py", line 116, in run_prompt_response_test
annotator.annotate_test_item(interactions_for_item.interactions)
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/annotators/perspective_api.py", line 88, in annotate_test_item
responses = _batch_execute_requests(self.client, requests)
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/annotators/perspective_api.py", line 151, in _batch_execute_requests
batch_request.execute()
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/googleapiclient/http.py", line 1604, in execute
callback(request_id, response, exception)
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/annotators/perspective_api.py", line 141, in _callback
raise error
File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/googleapiclient/http.py", line 1598, in execute
raise HttpError(resp, content, uri=request.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=SECRET&alt=json returned "Comment must be non-empty.". Details: "[{'@type': 'type.googleapis.com/google.commentanalyzer.v1alpha1.Error', 'errorType': 'COMMENT_EMPTY'}]">
It looks like HELM handles this by dropping the empty completions before sending to PerspectiveAPI.
I think we should make sure the completions list in the Annotation stays aligned with completions in the SUTResponse. I think the best option would be to populate scores
with zeros for the requested attributes.
Fixed in #184.
Can you provide some details on how you got in this situation? It seems wrong that a SUT should return zero text for a prompt.