open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.78k stars 405 forks source link

部分数据的子集提示报错,没有经过测试 #996

Closed luhairong11 closed 5 months ago

luhairong11 commented 6 months ago

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda-12.1', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0', 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A100 80GB PCIe', 'MMEngine': '0.10.3', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.66', 'OpenCV': '4.9.0', 'PyTorch': '2.2.1+cu121', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2022.2-Product Build 20220804 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.3.2 (Git Hash ' '2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - CUDA Runtime 12.1\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 8.9.2\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=8.9.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.2.1, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, ' 'USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]', 'TorchVision': '0.17.1', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.3+', 'sys.platform': 'linux'}

重现问题 - 代码/配置示例

image 只选择了部分数据进行测试 datasets = [ceval_datasets,cmmlu_datasets,*GaokaoBench_datasets]

重现问题 - 命令或脚本

python run.py configs/eval_qwen_7b_chat.py

重现问题 - 错误信息

cmmlu-conceptual_physics: {'accuracy': 59.863945578231295} cmmlu-construction_project_management: {'accuracy': 46.76258992805755} cmmlu-education: {'accuracy': 61.963190184049076} cmmlu-electrical_engineering: {'accuracy': 49.41860465116279} cmmlu-elementary_chinese: {'accuracy': 53.96825396825397} cmmlu-elementary_information_and_technology: {'accuracy': 77.73109243697479} cmmlu-food_science: {'accuracy': 58.04195804195804} cmmlu-genetics: {'accuracy': 48.29545454545455} cmmlu-high_school_mathematics: {'accuracy': 34.756097560975604} cmmlu-high_school_politics: {'accuracy': 60.83916083916085} cmmlu-jurisprudence: {'accuracy': 58.63746958637469} cmmlu-logical: {'accuracy': 47.15447154471545} cmmlu-nutrition: {'accuracy': 55.172413793103445} cmmlu-professional_accounting: {'accuracy': 70.85714285714285} cmmlu-professional_law: {'accuracy': 43.60189573459716} cmmlu-professional_medicine: {'accuracy': 41.755319148936174} cmmlu-professional_psychology: {'accuracy': 71.55172413793103} cmmlu-public_relations: {'accuracy': 59.195402298850574} cmmlu-sports_science: {'accuracy': 57.57575757575758} cmmlu-world_history: {'accuracy': 68.32298136645963} GaokaoBench_2010-2022_Math_I_MCQs: {'score': 34.112149532710276} GaokaoBench_2010-2022_Physics_MCQs: {'score': 28.90625} GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs: {'score': 32.5} GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank: {'score': 0} GaokaoBench_2010-2022_Chemistry_Open-ended_Questions: {'score': 0} GaokaoBench_2010-2022_Physics_Open-ended_Questions: {'score': 0} GaokaoBench_2010-2022_Political_Science_Open-ended_Questions: {'score': 0} GaokaoBench_2010-2022_Chinese_Language_Language_and_Writing_Skills_Open-ended_Questions: {'score': 0} cmmlu-humanities: {'error': "missing metrics: ['cmmlu-arts', 'cmmlu-chinese_literature', 'cmmlu-college_law', 'cmmlu-global_facts', 'cmmlu-international_law', 'cmmlu-marxist_theory', 'cmmlu-philosophy', 'cmmlu-world_religions']"} cmmlu-stem: {'error': "missing metrics: ['cmmlu-anatomy', 'cmmlu-astronomy', 'cmmlu-college_actuarial_science', 'cmmlu-college_engineering_hydrology', 'cmmlu-college_mathematics', 'cmmlu-college_medical_statistics', 'cmmlu-computer_science', 'cmmlu-elementary_mathematics', 'cmmlu-high_school_biology', 'cmmlu-high_school_chemistry', 'cmmlu-high_school_physics', 'cmmlu-machine_learning', 'cmmlu-virology']"} cmmlu-social-science: {'error': "missing metrics: ['cmmlu-business_ethics', 'cmmlu-chinese_civil_service_exam', 'cmmlu-chinese_food_culture', 'cmmlu-chinese_foreign_policy', 'cmmlu-college_education', 'cmmlu-economics', 'cmmlu-ethnology', 'cmmlu-high_school_geography', 'cmmlu-journalism', 'cmmlu-management', 'cmmlu-marketing', 'cmmlu-modern_chinese', 'cmmlu-security_study', 'cmmlu-sociology']"} cmmlu-other: {'error': "missing metrics: ['cmmlu-agronomy', 'cmmlu-chinese_driving_rule', 'cmmlu-college_medicine', 'cmmlu-computer_security', 'cmmlu-elementary_commonsense', 'cmmlu-human_sexuality', 'cmmlu-legal_and_moral_basis', 'cmmlu-traditional_chinese_medicine']"} cmmlu-china-specific: {'error': "missing metrics: ['cmmlu-chinese_civil_service_exam', 'cmmlu-chinese_driving_rule', 'cmmlu-chinese_food_culture', 'cmmlu-chinese_foreign_policy', 'cmmlu-chinese_literature', 'cmmlu-elementary_commonsense', 'cmmlu-ethnology', 'cmmlu-modern_chinese', 'cmmlu-traditional_chinese_medicine']"} cmmlu: {'error': "missing metrics: ['cmmlu-agronomy', 'cmmlu-anatomy', 'cmmlu-arts', 'cmmlu-astronomy', 'cmmlu-business_ethics', 'cmmlu-chinese_civil_service_exam', 'cmmlu-chinese_driving_rule', 'cmmlu-chinese_food_culture', 'cmmlu-chinese_foreign_policy', 'cmmlu-chinese_literature', 'cmmlu-college_actuarial_science', 'cmmlu-college_education', 'cmmlu-college_engineering_hydrology', 'cmmlu-college_law', 'cmmlu-college_mathematics', 'cmmlu-college_medical_statistics', 'cmmlu-college_medicine', 'cmmlu-computer_science', 'cmmlu-computer_security', 'cmmlu-economics', 'cmmlu-elementary_commonsense', 'cmmlu-elementary_mathematics', 'cmmlu-ethnology', 'cmmlu-global_facts', 'cmmlu-high_school_biology', 'cmmlu-high_school_chemistry', 'cmmlu-high_school_geography', 'cmmlu-high_school_physics', 'cmmlu-human_sexuality', 'cmmlu-international_law', 'cmmlu-journalism', 'cmmlu-legal_and_moral_basis', 'cmmlu-machine_learning', 'cmmlu-management', 'cmmlu-marketing', 'cmmlu-marxist_theory', 'cmmlu-modern_chinese', 'cmmlu-philosophy', 'cmmlu-security_study', 'cmmlu-sociology', 'cmmlu-traditional_chinese_medicine', 'cmmlu-virology', 'cmmlu-world_religions']"} ceval-stem: {'error': "missing metrics: ['ceval-computer_network', 'ceval-operating_system', 'ceval-computer_architecture', 'ceval-college_programming', 'ceval-college_physics', 'ceval-college_chemistry', 'ceval-advanced_mathematics', 'ceval-probability_and_statistics', 'ceval-electrical_engineer', 'ceval-metrology_engineer', 'ceval-high_school_physics', 'ceval-high_school_chemistry', 'ceval-high_school_biology', 'ceval-middle_school_mathematics', 'ceval-middle_school_biology', 'ceval-middle_school_physics', 'ceval-middle_school_chemistry', 'ceval-veterinary_medicine']"} ceval-social-science: {'error': "missing metrics: ['ceval-business_administration', 'ceval-marxism', 'ceval-mao_zedong_thought', 'ceval-education_science', 'ceval-high_school_politics', 'ceval-high_school_geography', 'ceval-middle_school_politics']"} ceval-other: {'error': "missing metrics: ['ceval-sports_science', 'ceval-plant_protection', 'ceval-basic_medicine', 'ceval-clinical_medicine', 'ceval-fire_engineer', 'ceval-environmental_impact_assessment_engineer']"} ceval-hard: {'error': "missing metrics: ['ceval-advanced_mathematics', 'ceval-probability_and_statistics', 'ceval-college_chemistry', 'ceval-college_physics', 'ceval-high_school_chemistry', 'ceval-high_school_physics']"} ceval: {'error': "missing metrics: ['ceval-computer_network', 'ceval-operating_system', 'ceval-computer_architecture', 'ceval-college_programming', 'ceval-college_physics', 'ceval-college_chemistry', 'ceval-advanced_mathematics', 'ceval-probability_and_statistics', 'ceval-electrical_engineer', 'ceval-metrology_engineer', 'ceval-high_school_physics', 'ceval-high_school_chemistry', 'ceval-high_school_biology', 'ceval-middle_school_mathematics', 'ceval-middle_school_biology', 'ceval-middle_school_physics', 'ceval-middle_school_chemistry', 'ceval-veterinary_medicine', 'ceval-business_administration', 'ceval-marxism', 'ceval-mao_zedong_thought', 'ceval-education_science', 'ceval-high_school_politics', 'ceval-high_school_geography', 'ceval-middle_school_politics', 'ceval-modern_chinese_history', 'ceval-ideological_and_moral_cultivation', 'ceval-logic', 'ceval-law', 'ceval-chinese_language_and_literature', 'ceval-art_studies', 'ceval-professional_tour_guide', 'ceval-legal_professional', 'ceval-high_school_chinese', 'ceval-high_school_history', 'ceval-middle_school_history', 'ceval-sports_science', 'ceval-plant_protection', 'ceval-basic_medicine', 'ceval-clinical_medicine', 'ceval-fire_engineer', 'ceval-environmental_impact_assessment_engineer']"} GaokaoBench: {'error': "missing metrics: ['GaokaoBench_2010-2022_Math_II_MCQs', 'GaokaoBench_2010-2022_History_MCQs', 'GaokaoBench_2010-2022_Biology_MCQs', 'GaokaoBench_2010-2022_Political_Science_MCQs', 'GaokaoBench_2010-2022_Chemistry_MCQs', 'GaokaoBench_2010-2013_English_MCQs', 'GaokaoBench_2010-2022_Chinese_Modern_Lit', 'GaokaoBench_2010-2022_English_Fill_in_Blanks', 'GaokaoBench_2012-2022_English_Cloze_Test', 'GaokaoBench_2010-2022_Geography_MCQs', 'GaokaoBench_2010-2022_English_Reading_Comp']"} $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

其他信息

No response

luhairong11 commented 6 months ago

上面结果是保存在summary-**.txt文件下

Leymore commented 6 months ago

麻烦看一下 logs/infer 路径下的内容,尤其是报错的对应的子集

luhairong11 commented 6 months ago

[2024-03-23 17:17:04,808] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 964785 closing signal SIGHUP Traceback (most recent call last): File "/home/miniconda3/envs/opencompass/bin/torchrun", line 8, in sys.exit(main()) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper return f(*args, *kwargs) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent result = agent.run() File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/metrics/api.py", line 123, in wrapper result = f(args, **kwargs) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 727, in run result = self._invoke_run(role) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 868, in _invoke_run time.sleep(monitor_interval) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 964588 got signal: 1

难道是多进程执行的问题?我在执行代码的时候CUDA_VISIBLE_DEVICES=1,2,3,4,5,6 python run.py configs/eval_qwen_7b_chat.py,同时修改了run.py中的--max-workers-per-gpu参数为3

Leymore commented 6 months ago

应该是多进程执行有问题,尝试 python run.py -r latest ..... 重试吧

luhairong11 commented 6 months ago

我在执行代码的时候CUDA_VISIBLE_DEVICES=1,2,3,4,5,6 python run.py configs/eval_qwen_7b_chat.py,同时修改了run.py中的--max-workers-per-gpu参数为3

重新执行代码的时候python run.py configs/eval_qwen_7b_chat.py,同时修改了run.py中的--max-workers-per-gpu参数为3,报同样的错误,不知道是不是--max-workers-per-gpu参数为3影响了,继续进行测试实验 File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 727, in run result = self._invoke_run(role) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 868, in _invoke_run time.sleep(monitor_interval) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 993678 got signal: 1

luhairong11 commented 5 months ago

我在执行代码的时候CUDA_VISIBLE_DEVICES=1,2,3,4,5,6 python run.py configs/eval_qwen_7b_chat.py,同时修改了run.py中的--max-workers-per-gpu参数为3

重新执行代码的时候python run.py configs/eval_qwen_7b_chat.py,同时修改了run.py中的--max-workers-per-gpu参数为3,报同样的错误,不知道是不是--max-workers-per-gpu参数为3影响了,继续进行测试实验 File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 727, in run result = self._invoke_run(role) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 868, in _invoke_run time.sleep(monitor_interval) File "/home/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 993678 got signal: 1

重新执行代码的时候python run.py configs/eval_qwen_7b_chat.py,同时修改了run.py中的--max-workers-per-gpu参数为1没有报错了

tonysy commented 5 months ago

Feel free to re-open if needed.

202030481266 commented 4 months ago

这个问题依然存在,我是用简单的推理脚本:

#! /bin/bash

MKL_SERVICE_FORCE_INTEL=1 python run.py --dataset cmmlu_ppl \
        --hf-path /root/my-internlm2-7b \
        --tokenizer-path /root/my-internlm2-7b/ \
        --tokenizer-kwargs trust_remote_code=True \
        --model-kwargs trust_remote_code=True device_map='auto' \
    --max-seq-len 2048 \
        --debug \
    --num-gpus 2 \
    --batch-size 96 

我花费了很多时间,重复试验了多次,发现每一次丢失的子集prediction都是一样的。解决方法就是修改batch-size,降低batch-size到64就可以正常运行了。


The issue persists with the following inference script commands:

#! /bin/bash

MKL_SERVICE_FORCE_INTEL=1 python run.py --dataset cmmlu_ppl \
        --hf-path /root/my-internlm2-7b \
        --tokenizer-path /root/my-internlm2-7b/ \
        --tokenizer-kwargs trust_remote_code=True \
        --model-kwargs trust_remote_code=True device_map='auto' \
        --max-seq-len 2048 \
        --debug \
        --num-gpus 2 \
        --batch-size 96 

After spending a considerable amount of time and trying multiple times, it was found that the same subset of predictions is lost each time. The solution was to modify the batch size; lowering the batch size to 64 allows for normal operation.

dh12306 commented 4 months ago
--tokenizer-path

我batch=1,还是出问题