tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
https://tatsu-lab.github.io/alpaca_eval/
Apache License 2.0
1.44k stars 224 forks source link

text davinci-003 is closed????? #166

Closed kkwhale7 closed 9 months ago

kkwhale7 commented 10 months ago

I use the command : alpaca_eval --model_outputs '/home/zhoudong/repos/alpaca_eval/alpaca_data/Infini-Megrez-7b-20231114-v2.json' --annotators_config 'text_davinci_003' --reference_outputs '/home/zhoudong/repos/alpaca_eval/alpaca_data/Baichuan2-7B-Chat.json' However: image it cause the error @winglian @zfang @jondurbin @44670 @rtaori

kkwhale7 commented 10 months ago

INFO:root:Evaluating the Infini-Megrez-7b-20231114-v2 outputs. INFO:root:Creating the annotator from text_davinci_003. INFO:root:Saving annotations to /mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/evaluators_configs/text_davinci_003/annotations_seed0_configs.json. INFO:root:Using openai_completions on 128 prompts using text-davinci-003.NFO:httpx:HTTP Request: POST https://api.openai.com/v1/completions "HTTP/1.1 200 OK" | 0/128 [00:00<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ result = (True, func(*args, **kwds)) choice["total_tokens"] = completion_batch.usage.total_tokens / len(prompt_batch) """

File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/main.py", line 546, in main fire.Fire(evaluate) """ Traceback (most recent call last): File "/home/zhoudong/miniconda3/envs/alpaca-eval/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/decoders/openai.py", line 234, in _openai_completion_helper choice["total_tokens"] = completion_batch.usage.total_tokens / len(prompt_batch)


TypeError: 'CompletionChoice' object does not support item assignment
"""
The above exception was the direct cause of the following exception:                                                                                                                                                    [0/587]

Traceback (most recent call last):
  File "/home/zhoudong/miniconda3/envs/alpaca-eval/bin/alpaca_eval", line 33, in <module>
    sys.exit(load_entry_point('alpaca-eval', 'console_scripts', 'alpaca_eval')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/main.py", line 546, in main
    fire.Fire(evaluate)
  File "/home/zhoudong/miniconda3/envs/alpaca-eval/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhoudong/miniconda3/envs/alpaca-eval/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/zhoudong/miniconda3/envs/alpaca-eval/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/main.py", line 133, in evaluate
    annotations = annotator.annotate_head2head(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/annotators/pairwise_evaluator.py", line 237, in annotate_head2head
    out = self.__call__(df_to_annotate, **decoding_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/annotators/base.py", line 171, in __call__
    df_annotated = self._annotate(curr_df_to_annotate, **decoding_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/annotators/base.py", line 258, in _annotate 
    curr_annotated = self.annotators[annotator](
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/annotators/base.py", line 569, in __call__
    completions = self.fn_completions(prompts=prompts, **self.completions_kwargs, **decoding_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/share/users/zhoudong/repos/alpaca_eval/src/alpaca_eval/decoders/openai.py", line 149, in openai_completions
    completions = list(
                  ^^^^^
  File "/home/zhoudong/miniconda3/envs/alpaca-eval/lib/python3.11/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/home/zhoudong/miniconda3/envs/alpaca-eval/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
TypeError: 'CompletionChoice' object does not support item assignment
YannDubs commented 10 months ago

what version of alpaca_eval are you using? can you pip install -U alpaca_eval?

if you still get the same error, can you send me a minimal reproducible example, e.g. a single output, for which you get that error? thanks!

YannDubs commented 9 months ago

Closing due to lack of answer. Feel free to reopen if the issue persists.

jwkirchenbauer commented 9 months ago

Hi there! FWIW, same issue as above I think, and I believe that the openai package may have moved in such a way as to break something in the annotation/evaluation step because of the objects used from the openai completions client.

Here is a (slightly non-minimal) repro example since I'm just dumping what I saw when searching for this git issue, but I think any config should cause this depending on which annotator is used based on the error.

Env:

configs.yaml:

mistral-7b-openorca:
  prompt_template: "mistral-7b-openorca/prompt.txt"
  fn_completions: "huggingface_local_completions"
  completions_kwargs:
    model_name: "Open-Orca/Mistral-7B-OpenOrca"
    model_kwargs:
      torch_dtype: 'bfloat16'
    max_new_tokens: 2048
    temperature: 0.7
    top_p: 1.0
    do_sample: True
  pretty_name: "Open-Orca/Mistral-7B-OpenOrca"
  link: "https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca"

prompt.txt

  <|im_start|>system
You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!
<|im_end|>
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant

Broken command:

alpaca_eval evaluate_from_model \
--model_configs 'mistral-7b-openorca' \
--annotators_config 'gpt35_turbo_instruct' \
--output_path=$OUTPUT_DIR \
--max_instances=10

Output:

INFO:root:cannot use `chunksize` with max_instances. Setting `chunksize` to None.
Chunking for generation:   0%|                                                                                          | 0/1 [00:00<?, ?it/s]INFO:root:Using `huggingface_local_completions` on 10 prompts using Open-Orca/Mistral-7B-OpenOrca.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.20s/it]
INFO:root:Model memory: 15.020376064 GB█████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  4.87s/it]
INFO:root:Kwargs to completion: {'do_sample': True, 'model_kwargs': {'torch_dtype': torch.bfloat16, 'device_map': 'auto'}, 'batch_size': 1, 'max_new_tokens': 2048, 'temperature': 0.7, 'top_p': 1.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [03:05<00:00, 18.53s/it]
INFO:root:Time for 10 completions: 185.3 seconds██████████████████████████████████████████████████████████████| 10/10 [03:05<00:00, 18.30s/it]
Chunking for generation: 100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [03:21<00:00, 201.81s/it]
WARNING:root:precomputed_leaderboard = 'auto'. But we have found no corresponding leaderboard
INFO:root:Evaluating the mistral-7b-openorca outputs.
INFO:root:Creating the annotator from `gpt35_turbo_instruct`.
INFO:root:Saving annotations to `~/alpaca_eval/src/alpaca_eval/evaluators_configs/gpt35_turbo_instruct/annotations_seed0_configs.json`.
Annotation chunk:   0%|                                                                                                 | 0/1 [00:00<?, ?it/s]INFO:root:Annotating 10 examples with gpt35_turbo_instruct
INFO:root:Using `openai_completions` on 10 prompts using gpt-3.5-turbo-instruct.
INFO:root:Kwargs to completion: {'n': 1, 'model': 'gpt-3.5-turbo-instruct', 'is_chat': False, 'temperature': 0}. num_procs=5
                                                                                                                                             INFO:httpx:HTTP Request: POST https://api.openai.com/v1/completions "HTTP/1.1 200 OK"                                   | 0/10 [00:00<?, ?it/s]
prompt_batches:   0%|                                                                                                  | 0/10 [00:01<?, ?it/s]
Annotation chunk:   0%|                                                                                                 | 0/1 [00:01<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/envs/alpaca-env/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "~/alpaca_eval/src/alpaca_eval/decoders/openai.py", line 234, in _openai_completion_helper
    choice["total_tokens"] = completion_batch.usage.total_tokens / len(prompt_batch)
TypeError: 'CompletionChoice' object does not support item assignment
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/alpaca-env/bin/alpaca_eval", line 33, in <module>
    sys.exit(load_entry_point('alpaca-eval', 'console_scripts', 'alpaca_eval')())
  File "~/alpaca_eval/src/alpaca_eval/main.py", line 543, in main
    fire.Fire(ALL_FUNCTIONS)
  File "/opt/conda/envs/alpaca-env/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/envs/alpaca-env/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/envs/alpaca-env/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "~/alpaca_eval/src/alpaca_eval/main.py", line 332, in evaluate_from_model
    return evaluate(
  File "~/alpaca_eval/src/alpaca_eval/main.py", line 133, in evaluate
    annotations = annotator.annotate_head2head(
  File "~/alpaca_eval/src/alpaca_eval/annotators/pairwise_evaluator.py", line 237, in annotate_head2head
    out = self.__call__(df_to_annotate, **decoding_kwargs)
  File "~/alpaca_eval/src/alpaca_eval/annotators/base.py", line 171, in __call__
    df_annotated = self._annotate(curr_df_to_annotate, **decoding_kwargs)
  File "~/alpaca_eval/src/alpaca_eval/annotators/base.py", line 258, in _annotate
    curr_annotated = self.annotators[annotator](
  File "~/alpaca_eval/src/alpaca_eval/annotators/base.py", line 569, in __call__
    completions = self.fn_completions(prompts=prompts, **self.completions_kwargs, **decoding_kwargs)
  File "~/alpaca_eval/src/alpaca_eval/decoders/openai.py", line 149, in openai_completions
    completions = list(
  File "/opt/conda/envs/alpaca-env/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/opt/conda/envs/alpaca-env/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
TypeError: 'CompletionChoice' object does not support item assignment

Working command: (probably because the evaluator is now alpaca_eval_gpt4 implicitly)

alpaca_eval evaluate_from_model \
--model_configs 'mistral-7b-openorca' \
--output_path=$OUTPUT_DIR \
--max_instances=10

(new) Output:

 ...
 Annotation chunk:   0%|                                                                                                 | 0/1 [00:00<?, ?it/s]INFO:root:Annotating 10 examples with alpaca_eval_gpt4
INFO:root:Using `openai_completions` on 10 prompts using gpt-4.
INFO:root:Kwargs to completion: {'n': 1, 'model': 'gpt-4', 'is_chat': True, 'temperature': 0}. num_procs=5
                                                                                                                                             INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"                              | 0/10 [00:00<?, ?it/s]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
                                                                                                                                             INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"                      | 1/10 [00:02<00:25,  2.79s/it]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
                                                                                                                                             INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"                      | 2/10 [00:02<00:10,  1.25s/it]
                                                                                                                                             INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"                      | 3/10 [00:03<00:05,  1.32it/s]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
                                                                                                                                             INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"                      | 6/10 [00:06<00:03,  1.07it/s]
prompt_batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.55it/s]
INFO:root:Completed 10 examples in 6.7 seconds.████████████████████████████████████▉                           | 7/10 [00:06<00:02,  1.31it/s]
INFO:root:Saving all annotations to ~/alpaca_eval/src/alpaca_eval/evaluators_configs/alpaca_eval_gpt4/annotations_seed0_configs.json.
Annotation chunk: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.82s/it]
INFO:root:Saving all results to $OUTPUT_DIR/alpaca_eval/debug
 ...

Hope this helps find a patch more quickly!

YannDubs commented 9 months ago

fixed, thanks @jwkirchenbauer it was for non-chat openai models. pip install -U alpaca_eval and you should be good!