openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.35k stars 2.54k forks source link

Support for GPT-4o #1529

Closed PrashantDixit0 closed 1 month ago

PrashantDixit0 commented 1 month ago

Describe the feature or improvement you're requesting

Please add support for GPT4-o for evaluation .

Additional context

No response

j30231 commented 1 month ago

I also encountered an error in the same item. I would like to perform an evaluation on only computer security items out of the 57 items of mmlu. However, the gpt-4o model is not supported yet.

!oaieval gpt-4o match_mmlu_computer_security
[2024-05-16 13:13:17,116] [registry.py:271] Loading registry from /Users/jaesik/ai/evals/evals/registry/evals
[2024-05-16 13:13:17,379] [registry.py:271] Loading registry from /Users/jaesik/.evals/evals
[2024-05-16 13:13:17,750] [oaieval.py:215] Run started: 240516041317M4NX4QQM
[2024-05-16 13:13:17,989] [data.py:94] Fetching /Users/jaesik/ai/evals/examples/../evals/registry/data/mmlu/computer_security/few_shot.jsonl
[2024-05-16 13:13:17,990] [data.py:94] Fetching /Users/jaesik/ai/evals/examples/../evals/registry/data/mmlu/computer_security/samples.jsonl
[2024-05-16 13:13:17,990] [eval.py:36] Evaluating 100 samples
[2024-05-16 13:13:18,002] [eval.py:144] Running in threaded mode with 10 threads!
  0%|                                                   | 0/100 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/jaesik/miniconda3/bin/oaieval", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/jaesik/ai/evals/evals/cli/oaieval.py", line 304, in main
    run(args)
  File "/Users/jaesik/ai/evals/evals/cli/oaieval.py", line 226, in run
    result = eval.run(recorder)
             ^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/ai/evals/evals/elsuite/basic/match.py", line 60, in run
    self.eval_all_samples(recorder, samples)
  File "/Users/jaesik/ai/evals/evals/eval.py", line 146, in eval_all_samples
    idx_and_result = list(tqdm(iter, total=len(work_items), disable=not show_progress))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/Users/jaesik/miniconda3/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
  File "/Users/jaesik/miniconda3/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/ai/evals/evals/eval.py", line 137, in eval_sample
    return idx, self.eval_sample(sample, rng)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/ai/evals/evals/elsuite/basic/match.py", line 46, in eval_sample
    result = self.completion_fn(
             ^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/ai/evals/evals/completion_fns/openai.py", line 118, in __call__
    result = openai_completion_create_retrying(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/ai/evals/evals/completion_fns/openai.py", line 32, in openai_completion_create_retrying
    result = create_retrying(
             ^^^^^^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/backoff/_sync.py", line 48, in retry
    ret = target(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/ai/evals/evals/utils/api_utils.py", line 20, in create_retrying
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/openai/resources/completions.py", line 528, in create
    return self._post(
           ^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/Users/jaesik/miniconda3/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}
androettop commented 1 month ago

I have created a pull request to add support #1530, the change is super simple, you can do it manually so you don't have to wait for the change to be merged.

PrashantDixit0 commented 1 month ago

Thank you @androettop for adding it :+1: