open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.01k stars 423 forks source link

[Bug] 评测速度很慢 #974

Closed THUchenzhou closed 7 months ago

THUchenzhou commented 7 months ago

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda-11.6', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0', 'GPU 0,1': 'NVIDIA A100 80GB PCIe', 'GPU 2': 'Quadro P620', 'MMEngine': '0.10.3', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 11.6, V11.6.55', 'OpenCV': '4.9.0', 'PyTorch': '1.13.1+cu116', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201402\n' ' - Intel(R) Math Kernel Library Version ' '2020.0.0 Product Build 20191122 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v2.6.0 (Git Hash ' '52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX2\n' ' - CUDA Runtime 11.6\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n' ' - CuDNN 8.9.3 (built against CUDA 11.8)\n' ' - Built with CuDNN 8.3.2\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=11.6, ' 'CUDNN_VERSION=8.3.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -fabi-version=11 -Wno-deprecated ' '-fvisibility-inlines-hidden -DUSE_PTHREADPOOL ' '-fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-DEDGE_PROFILER_USE_KINETO -O2 -fPIC ' '-Wno-narrowing -Wall -Wextra ' '-Werror=return-type -Werror=non-virtual-dtor ' '-Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wunused-local-typedefs ' '-Wno-unused-parameter -Wno-unused-function ' '-Wno-unused-result -Wno-strict-overflow ' '-Wno-strict-aliasing ' '-Wno-error=deprecated-declarations ' '-Wno-stringop-overflow -Wno-psabi ' '-Wno-error=pedantic -Wno-error=redundant-decls ' '-Wno-error=old-style-cast ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Werror=cast-function-type ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'Python': '3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) ' '[GCC 12.3.0]', 'TorchVision': '0.14.1+cu116', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.3+3098d78', 'sys.platform': 'linux'}

重现问题 - 代码/配置示例

新建run.bash,如下: CUDA_VISIBLE_DEVICES=1 python run.py --datasets mmlu_ppl \ --hf-path /home/data/Llama2/llama-2-7b-hf-safetensors \ --model-kwargs device_map='auto' trust_remote_code=True \ --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False trust_remote_code=True \ --max-out-len 100 \ --max-seq-len 2048 \ --batch-size 8 \ --no-batch-padding \ --num-gpus 1

重现问题 - 命令或脚本

运行了run.bash

重现问题 - 错误信息

03/14 17:32:26 - OpenCompass - INFO - Loading mmlu_ppl: configs/datasets/mmlu/mmlu_ppl.py 03/14 17:32:26 - OpenCompass - INFO - Loading example: configs/summarizers/example.py 03/14 17:32:27 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 03/14 17:32:27 - OpenCompass - INFO - Partitioned into 4 tasks. launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_0] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_1] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_scenarios,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_miscellaneous,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_elementary_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_disputes,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_prehistory,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_philosophy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_nutrition,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_accounting,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_clinical_knowledge,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_security_studies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_microeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_world_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_conceptual_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_marketing,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_aging,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_statistics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_us_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_sociology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_geography,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_world_religions,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_virology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_european_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_logical_fallacies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_astronomy] on GPU 1 50%|███████████████████████████████████████████████████████████████ | 2/4 [1:20:17<1:20:07, 2403.78s/it]

其他信息

您好,我的问题是,我采用LLamaFactory框架进行评测时,相同的模型、数据集,评测耗费总时间为30分。而采用open-compass需要2小时40分,耗费时间增大了很多倍,请问有办法优化评测的速度吗

bittersweet1999 commented 7 months ago

It seems that all the tasks run on the same GPU?

THUchenzhou commented 7 months ago

It seems that all the tasks run on the same GPU?

yes

tonysy commented 7 months ago

Would you like to provide an example of evaluation with Llama-factory?

THUchenzhou commented 7 months ago

Would you like to provide an example of evaluation with Llama-factory?

I found a new problem, my progress bar keeps getting stuck at 50%. I think my slower reasoning may be caused by this , is there a good solution? 03/14 17:17:45 - OpenCompass - INFO - Loading mmlu_ppl: configs/datasets/mmlu/mmlu_ppl.py 03/14 17:17:45 - OpenCompass - INFO - Loading example: configs/summarizers/example.py 03/14 17:17:45 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 03/14 17:17:45 - OpenCompass - INFO - Partitioned into 4 tasks. launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_law_0] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_law_1] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_moral_scenarios,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_miscellaneous,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_elementary_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_moral_disputes,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_prehistory,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_philosophy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_nutrition,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_accounting,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_clinical_knowledge,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_security_studies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_microeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_world_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_conceptual_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_marketing,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_human_aging,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_statistics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_us_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_sociology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_geography,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_college_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_world_religions,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_virology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_european_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_logical_fallacies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_astronomy] on GPU 1 50%|███████████████████████████████████████████████████████████████ | 2/4 [1:05:15<1:08:05, 2042.90s/it

THUchenzhou commented 7 months ago

I followed your tutorial and reconfigured the environment to try to evaluate the llama and Qwen and found the following error:

Terminal: launch OpenICLInfer[qwen-7b-hf/math_0] on GPU 0
0%| | 0/175 [00:00<?, ?it/s]03/14 21:00:00 - OpenCompass - ERROR - /home/chenzhou/Project/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[qwen-7b-hf/math_0] fail, see ./outputs/default/20240314_205925/logs/infer/qwen-7b-hf/math_0.out launch OpenICLInfer[qwen-7b-hf/lcsts_0] on GPU 0
1%|▋ | 1/175 [00:13<37:45, 13.02s/it]03/14 21:00:12 - OpenCompass - ERROR - /home/chenzhou/Project/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[qwen-7b-hf/lcsts_0] fail, see ./outputs/default/20240314_205925/logs/infer/qwen-7b-hf/lcsts_0.out launch OpenICLInfer[qwen-7b-hf/lcsts_1] on GPU 0
1%|█▍ | 2/175 [00:24<34:54, 12.11s/it]

logs/infer: 03/14 21:00:05 - OpenCompass - INFO - Task [qwen-7b-hf/lcsts_0] Traceback (most recent call last): File "/home//Project/opencompass/opencompass/tasks/openicl_infer.py", line 153, in inferencer.run() File "/home//Project/opencompass/opencompass/tasks/openicl_infer.py", line 65, in run self.model = build_model_from_cfg(model_cfg) File "/home//Project/opencompass/opencompass/utils/build.py", line 25, in build_model_from_cfg return MODELS.build(model_cfg) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, kwargs, registry=self) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home//Project/opencompass/opencompass/models/huggingface.py", line 124, in init self._load_model(path=path, File "/home//Project/opencompass/opencompass/models/huggingface.py", line 674, in _load_model self.model = AutoModelForCausalLM.from_pretrained(path, *model_kwargs) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 548, in from_pretrained model_class = get_class_from_dynamic_module( File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 488, in get_class_from_dynamic_module final_module = get_cached_module_file( File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 314, in get_cached_module_file modules_needed = check_imports(resolved_module_file) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 180, in check_imports raise ImportError( ImportError: This modeling file requires the following packages that were not found in your environment: transformers_stream_generator. Run pip install transformers_stream_generator [2024-03-14 21:00:11,861] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 627902) of binary: /home//anaconda3/envs/openCompass/bin/python3.1 Traceback (most recent call last): File "/home//anaconda3/envs/openCompass/bin/torchrun", line 8, in sys.exit(main()) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper return f(args, **kwargs) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home//anaconda3/envs/openCompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home//Project/opencompass/opencompass/tasks/openicl_infer.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-03-14_21:00:11 host : lthpc rank : 0 (local_rank: 0) exitcode : 1 (pid: 627902) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
bittersweet1999 commented 7 months ago

ImportError: This modeling file requires the following packages that were not found in your environment: transformers_stream_generator. Run pip install transformers_stream_generator try this

pip install transformers_stream_generator 
THUchenzhou commented 7 months ago

Would you like to provide an example of evaluation with Llama-factory?

Thanks. I have finished the evaluation, it costs 3:49:37 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [3:49:37<00:00, 3444.48s/it] 03/14 21:07:23 - OpenCompass - INFO - Partitioned into 57 tasks.

mmlu-humanities - naive_average ppl 51.39 mmlu-stem - naive_average ppl 37.66 mmlu-social-science - naive_average ppl 52.41 mmlu-other - naive_average ppl 49.47 mmlu - naive_average ppl 46.59 mmlu-weighted - weighted_average ppl 45.81

The script is: CUDA_VISIBLE_DEVICES=1 python run.py --datasets mmlu_ppl --hf-path /home/data/Llama2/llama-2-7b-hf-safetensors --model-kwargs device_map='auto' trust_remote_code=True --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False trust_remote_code=True --max-out-len 100 --max-seq-len 2048 --batch-size 8 --no-batch-padding --num-gpus 1

the script I used in llama-factory is: CUDA_VISIBLE_DEVICES=1 python src/evaluate.py \ --model_name_or_path /home/chenzhou/data/Llama2/llama-2-7b-hf-safetensors \ --template vanilla \ --task mmlu \ --split test \ --lang en \ --n_shot 5 \ --batch_size 8 \ --save_dir /home/chenzhou/Project/LLaMA-Factory/evaluation_result_ablation/mmlu-llama-2-7b-vanilla-test It costs 30:17 Processing subjects: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 57/57 [30:17<00:00, 31.88s/it, world religions Average: 45.53
STEM: 36.70 Social Sciences: 51.32 Humanities: 42.71 Other: 52.32

bittersweet1999 commented 7 months ago

For the time cost, I guess this is may be caused by 'Partitioned into 57 tasks', the tasks number is too large, could have necessitated numerous model loading instances, thereby leading to a significant expenditure of time. Therefore, using a larger number of tasks on a cluster is convenient, but it is not recommended to do so if there is only one GPU.

THUchenzhou commented 7 months ago

The parameters I used are the default ones from the official tutorial. How do I set it up so that I can reduce the number of tasks?

THUchenzhou commented 7 months ago

For the time cost, I guess this is may be caused by 'Partitioned into 57 tasks', the tasks number is too large, could have necessitated numerous model loading instances, thereby leading to a significant expenditure of time. Therefore, using a larger number of tasks on a cluster is convenient, but it is not recommended to do so if there is only one GPU.

The infer is divided into 4 tasks, and eval is divided into 57 tasks. I think it is not caused by 57 tasks because it's the infer that costs a lot of time.

03/14 17:32:26 - OpenCompass - INFO - Loading mmlu_ppl: configs/datasets/mmlu/mmlu_ppl.py 03/14 17:32:26 - OpenCompass - INFO - Loading example: configs/summarizers/example.py 03/14 17:32:27 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 03/14 17:32:27 - OpenCompass - INFO - Partitioned into 4 tasks. launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_0] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_1] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_scenarios,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_miscellaneous,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_elementary_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_disputes,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_prehistory,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_philosophy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_nutrition,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_accounting,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_clinical_knowledge,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_security_studies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_microeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_world_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_conceptual_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_marketing,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_aging,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_statistics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_us_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_sociology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_geography,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_world_religions,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_virology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_european_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_logical_fallacies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_astronomy] on GPU 1 launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_electrical_engineering,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_anatomy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_sexuality,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_formal_logic,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_international_law,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_econometrics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_machine_learning,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_public_relations,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_jurisprudence,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_management,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_computer_science,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_abstract_algebra,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_global_facts,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_computer_security,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_medical_genetics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_computer_science,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_business_ethics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_us_foreign_policy] on GPU 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [3:49:20<00:00, 3440.05s/it] 03/14 21:21:47 - OpenCompass - INFO - Partitioned into 57 tasks. launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_biology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_chemistry] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_mathematics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_computer_science] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_physics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_electrical_engineering] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_astronomy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_anatomy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_abstract_algebra] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_machine_learning] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_clinical_knowledge] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_global_facts] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_nutrition] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_management] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_accounting] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_geography] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_international_law] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_marketing] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_scenarios] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_microeconomics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_computer_security] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_medical_genetics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_psychology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_jurisprudence] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_world_religions] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_philosophy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_virology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_chemistry] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_public_relations] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_sexuality] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_elementary_mathematics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_physics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_computer_science] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_european_history] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_business_ethics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_disputes] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_statistics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_miscellaneous] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_formal_logic] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_security_studies] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_prehistory] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_logical_fallacies] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_biology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_world_history] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_medicine] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_mathematics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_medicine] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_us_history] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_sociology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_econometrics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_psychology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_aging] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_us_foreign_policy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_conceptual_physics] on CPU
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [03:12<00:00, 3.37s/it]

tonysy commented 7 months ago

Actually, we use batch_padding = False as default to avoid performance drop, use batch_padding is expected to speedup the evaluation process.

tonysy commented 7 months ago

Also you can try vllm or lmdeploy to speed up the evaluation process. We will update the batch_padding configuration in code and documentation recently. Thanks again. Feel free to re-open if needed.