mlcommons / modelgauge

Make it easy to automatically and uniformly measure the behavior of many AI Systems.
https://mlcommons.org/ai-safety/
Apache License 2.0
25 stars 7 forks source link

Empty comments should not crash Perspective annotator #167

Closed wpietri closed 6 months ago

brianwgoldman commented 7 months ago

Can you provide some details on how you got in this situation? It seems wrong that a SUT should return zero text for a prompt.

wpietri commented 7 months ago

I get this pretty frequently running coffee benchmark. I agree a SUT shouldn't return an empty string, but I think it's also true that an annotator shouldn't crash when that happens. Here's a fresh stack trace:

Examining system "Meta Llama 2, 7b parameters"
  Starting run for benchmark "General Chat Bot"
    Examining harm "Toxicity"
[...]
  File "/home/william/projects/mlcommons/coffee/src/coffee/run.py", line 77, in benchmark
    results[test] = run_prompt_response_test(
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/runners/simple_benchmark_runner.py", line 116, in run_prompt_response_test
    annotator.annotate_test_item(interactions_for_item.interactions)
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/annotators/perspective_api.py", line 88, in annotate_test_item
    responses = _batch_execute_requests(self.client, requests)
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/annotators/perspective_api.py", line 151, in _batch_execute_requests
    batch_request.execute()
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/googleapiclient/http.py", line 1604, in execute
    callback(request_id, response, exception)
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/newhelm/annotators/perspective_api.py", line 141, in _callback
    raise error
  File "/home/william/.cache/pypoetry/virtualenvs/coffee-sVVVZhgw-py3.10/lib/python3.10/site-packages/googleapiclient/http.py", line 1598, in execute
    raise HttpError(resp, content, uri=request.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=SECRET&alt=json returned "Comment must be non-empty.". Details: "[{'@type': 'type.googleapis.com/google.commentanalyzer.v1alpha1.Error', 'errorType': 'COMMENT_EMPTY'}]">
brianwgoldman commented 7 months ago

It looks like HELM handles this by dropping the empty completions before sending to PerspectiveAPI.

I think we should make sure the completions list in the Annotation stays aligned with completions in the SUTResponse. I think the best option would be to populate scores with zeros for the requested attributes.

brianwgoldman commented 6 months ago

Fixed in #184.