open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.69k stars 394 forks source link

[Bug] run with transformers==4.40.2, error "HuggingFacewithChatTemplate does not support ppl-based evaluation". #1157

Open zhulinJulia24 opened 3 months ago

zhulinJulia24 commented 3 months ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

opencompass: main code transfromers == 4.40.2, If I downgrade transformers == 4.33.0, It works

Reproduces the problem - code/configuration sample

python3 run.py --models hf_chatglm3_6b --datasets FewCLUE_chid_ppl humaneval_gen ARC_c_ppl obqa_ppl

Reproduces the problem - command or script

python3 run.py --models hf_chatglm3_6b --datasets FewCLUE_chid_ppl humaneval_gen ARC_c_ppl obqa_ppl

Reproduces the problem - error message

/root/miniconda3/envs/opencompass_regression_test/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
^MLoading checkpoint shards:   0%|                                                                                                                                                                                                                      | 0/7 [00:00<?, ?it/s]^MLoading checkpoint shards:  14%|█████████████████████████████▍                                                                                                                                                                                | 1/7 [00:00<00:02,  2.82it/s]^MLoading checkpoint shards:  29%|██████████████████████████████████████████████████████████▊                                                                                                                                                   | 2/7 [00:00<00:01,  2.80it/s]^MLoading checkpoint shards:  43%|████████████████████████████████████████████████████████████████████████████████████████▎                                                                                                                     | 3/7 [00:01<00:01,  2.83it/s]^MLoading checkpoint shards:  57%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                        | 4/7 [00:01<00:01,  2.94it/s]^MLoading checkpoint shards:  71%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                          | 5/7 [00:01<00:00,  2.94it/s]^MLoading checkpoint shards:  86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                             | 6/7 [00:02<00:00,  2.94it/s]^MLoading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00,  3.34it/s]^MLoading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00,  3.07it/s]
05/15 10:57:20 - OpenCompass - INFO - Start inferencing [chatglm3-6b-hf/chid-dev]
[2024-05-15 10:57:20,899] [opencompass.openicl.icl_inferencer.icl_ppl_inferencer] [INFO] Calculating PPL for prompts labeled '0'
^M  0%|                                                                                                                                                                                                                                                | 0/26 [00:00<?, ?it/s]^M  0%|                                                                                                                                                                                                                                                | 0/26 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/tasks/openicl_infer.py", line 162, in <module>
    inferencer.run()
  File "/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/tasks/openicl_infer.py", line 90, in run
    self._inference()
  File "/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/tasks/openicl_infer.py", line 135, in _inference
    inferencer.inference(retriever,
  File "/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 159, in inference
    sub_res = self.model.get_ppl_from_template(sub_prompt_list).tolist()
  File "/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/models/base.py", line 152, in get_ppl_from_template
    return self.get_ppl(inputs, mask_length)
  File "/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/models/base.py", line 84, in get_ppl
    raise NotImplementedError(f'{self.__class__.__name__} does not support'
NotImplementedError: HuggingFacewithChatTemplate does not support ppl-based evaluation yet, try gen-based instead.
E0515 10:57:22.766000 139752328677184 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 283609) of binary: /root/miniconda3/envs/opencompass_regression_test/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/opencompass_regression_test/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/opencompass_regression_test/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/root/miniconda3/envs/opencompass_regression_test/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/root/miniconda3/envs/opencompass_regression_test/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/root/miniconda3/envs/opencompass_regression_test/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/opencompass_regression_test/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/cpfs01/user/qa-llm-cicd/opencompass_new/opencompass/tasks/openicl_infer.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-05-15_10:57:22
  host      : dsw-53748-5b58c48465-7s4tm
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 283609)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Other information

It seems related to transformers version

202030481266 commented 3 months ago

same

YanxingLiu commented 3 months ago

same

bestpredicts commented 2 weeks ago

any update?