Closed THUchenzhou closed 7 months ago
It seems that all the tasks run on the same GPU?
It seems that all the tasks run on the same GPU?
yes
Would you like to provide an example of evaluation with Llama-factory?
Would you like to provide an example of evaluation with Llama-factory?
I found a new problem, my progress bar keeps getting stuck at 50%. I think my slower reasoning may be caused by this , is there a good solution?
03/14 17:17:45 - OpenCompass - INFO - Loading mmlu_ppl: configs/datasets/mmlu/mmlu_ppl.py
03/14 17:17:45 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
03/14 17:17:45 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
03/14 17:17:45 - OpenCompass - INFO - Partitioned into 4 tasks.
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_law_0] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_law_1] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_moral_scenarios,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_miscellaneous,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_elementary_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_moral_disputes,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_prehistory,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_philosophy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_nutrition,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_accounting,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_professional_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_clinical_knowledge,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_security_studies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_microeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_world_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_conceptual_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_marketing,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_human_aging,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_statistics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_us_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_sociology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_geography,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_college_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_world_religions,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_virology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_high_school_european_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_logical_fallacies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-chat-hf-safetensors/lukaemon_mmlu_astronomy] on GPU 1
50%|███████████████████████████████████████████████████████████████ | 2/4 [1:05:15<1:08:05, 2042.90s/it
I followed your tutorial and reconfigured the environment to try to evaluate the llama and Qwen and found the following error:
Terminal:
launch OpenICLInfer[qwen-7b-hf/math_0] on GPU 0
0%| | 0/175 [00:00<?, ?it/s]03/14 21:00:00 - OpenCompass - ERROR - /home/chenzhou/Project/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[qwen-7b-hf/math_0] fail, see
./outputs/default/20240314_205925/logs/infer/qwen-7b-hf/math_0.out
launch OpenICLInfer[qwen-7b-hf/lcsts_0] on GPU 0
1%|▋ | 1/175 [00:13<37:45, 13.02s/it]03/14 21:00:12 - OpenCompass - ERROR - /home/chenzhou/Project/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[qwen-7b-hf/lcsts_0] fail, see
./outputs/default/20240314_205925/logs/infer/qwen-7b-hf/lcsts_0.out
launch OpenICLInfer[qwen-7b-hf/lcsts_1] on GPU 0
1%|█▍ | 2/175 [00:24<34:54, 12.11s/it]
pip install transformers_stream_generator
[2024-03-14 21:00:11,861] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 627902) of binary: /home//anaconda3/envs/openCompass/bin/python3.1
Traceback (most recent call last):
File "/home//anaconda3/envs/openCompass/bin/torchrun", line 8, in Failures:
ImportError: This modeling file requires the following packages that were not found in your environment: transformers_stream_generator. Run pip install transformers_stream_generator try this
pip install transformers_stream_generator
Would you like to provide an example of evaluation with Llama-factory?
Thanks. I have finished the evaluation, it costs 3:49:37 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [3:49:37<00:00, 3444.48s/it] 03/14 21:07:23 - OpenCompass - INFO - Partitioned into 57 tasks.
mmlu-humanities - naive_average ppl 51.39 mmlu-stem - naive_average ppl 37.66 mmlu-social-science - naive_average ppl 52.41 mmlu-other - naive_average ppl 49.47 mmlu - naive_average ppl 46.59 mmlu-weighted - weighted_average ppl 45.81
The script is: CUDA_VISIBLE_DEVICES=1 python run.py --datasets mmlu_ppl --hf-path /home/data/Llama2/llama-2-7b-hf-safetensors --model-kwargs device_map='auto' trust_remote_code=True --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False trust_remote_code=True --max-out-len 100 --max-seq-len 2048 --batch-size 8 --no-batch-padding --num-gpus 1
the script I used in llama-factory is:
CUDA_VISIBLE_DEVICES=1 python src/evaluate.py \
--model_name_or_path /home/chenzhou/data/Llama2/llama-2-7b-hf-safetensors \
--template vanilla \
--task mmlu \
--split test \
--lang en \
--n_shot 5 \
--batch_size 8 \
--save_dir /home/chenzhou/Project/LLaMA-Factory/evaluation_result_ablation/mmlu-llama-2-7b-vanilla-test
It costs 30:17
Processing subjects: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 57/57 [30:17<00:00, 31.88s/it, world religions
Average: 45.53
STEM: 36.70
Social Sciences: 51.32
Humanities: 42.71
Other: 52.32
For the time cost, I guess this is may be caused by 'Partitioned into 57 tasks', the tasks number is too large, could have necessitated numerous model loading instances, thereby leading to a significant expenditure of time. Therefore, using a larger number of tasks on a cluster is convenient, but it is not recommended to do so if there is only one GPU.
The parameters I used are the default ones from the official tutorial. How do I set it up so that I can reduce the number of tasks?
For the time cost, I guess this is may be caused by 'Partitioned into 57 tasks', the tasks number is too large, could have necessitated numerous model loading instances, thereby leading to a significant expenditure of time. Therefore, using a larger number of tasks on a cluster is convenient, but it is not recommended to do so if there is only one GPU.
The infer is divided into 4 tasks, and eval is divided into 57 tasks. I think it is not caused by 57 tasks because it's the infer that costs a lot of time.
03/14 17:32:26 - OpenCompass - INFO - Loading mmlu_ppl: configs/datasets/mmlu/mmlu_ppl.py
03/14 17:32:26 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
03/14 17:32:27 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
03/14 17:32:27 - OpenCompass - INFO - Partitioned into 4 tasks.
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_0] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_1] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_scenarios,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_miscellaneous,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_elementary_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_disputes,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_prehistory,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_philosophy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_nutrition,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_accounting,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_clinical_knowledge,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_security_studies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_microeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_world_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_conceptual_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_marketing,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_aging,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_statistics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_us_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_sociology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_geography,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_world_religions,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_virology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_european_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_logical_fallacies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_astronomy] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_electrical_engineering,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_anatomy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_sexuality,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_formal_logic,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_international_law,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_econometrics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_machine_learning,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_public_relations,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_jurisprudence,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_management,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_computer_science,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_abstract_algebra,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_global_facts,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_computer_security,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_medical_genetics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_computer_science,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_business_ethics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_us_foreign_policy] on GPU 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [3:49:20<00:00, 3440.05s/it]
03/14 21:21:47 - OpenCompass - INFO - Partitioned into 57 tasks.
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_biology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_chemistry] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_mathematics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_computer_science] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_physics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_electrical_engineering] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_astronomy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_anatomy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_abstract_algebra] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_machine_learning] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_clinical_knowledge] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_global_facts] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_nutrition] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_management] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_accounting] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_geography] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_international_law] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_marketing] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_scenarios] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_microeconomics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_computer_security] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_medical_genetics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_psychology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_jurisprudence] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_world_religions] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_philosophy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_virology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_chemistry] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_public_relations] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_sexuality] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_elementary_mathematics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_physics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_computer_science] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_european_history] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_business_ethics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_disputes] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_statistics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_miscellaneous] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_formal_logic] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_security_studies] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_prehistory] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_logical_fallacies] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_biology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_world_history] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_medicine] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_mathematics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_medicine] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_us_history] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_sociology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_econometrics] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_psychology] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_aging] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_us_foreign_policy] on CPU
launch OpenICLEval[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_conceptual_physics] on CPU
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [03:12<00:00, 3.37s/it]
Actually, we use batch_padding = False as default to avoid performance drop, use batch_padding is expected to speedup the evaluation process.
Also you can try vllm or lmdeploy to speed up the evaluation process. We will update the batch_padding configuration in code and documentation recently. Thanks again. Feel free to re-open if needed.
先决条件
问题类型
我正在使用官方支持的任务/模型/数据集进行评估。
环境
{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda-11.6', 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0', 'GPU 0,1': 'NVIDIA A100 80GB PCIe', 'GPU 2': 'Quadro P620', 'MMEngine': '0.10.3', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 11.6, V11.6.55', 'OpenCV': '4.9.0', 'PyTorch': '1.13.1+cu116', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201402\n' ' - Intel(R) Math Kernel Library Version ' '2020.0.0 Product Build 20191122 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v2.6.0 (Git Hash ' '52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX2\n' ' - CUDA Runtime 11.6\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n' ' - CuDNN 8.9.3 (built against CUDA 11.8)\n' ' - Built with CuDNN 8.3.2\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=11.6, ' 'CUDNN_VERSION=8.3.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -fabi-version=11 -Wno-deprecated ' '-fvisibility-inlines-hidden -DUSE_PTHREADPOOL ' '-fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-DEDGE_PROFILER_USE_KINETO -O2 -fPIC ' '-Wno-narrowing -Wall -Wextra ' '-Werror=return-type -Werror=non-virtual-dtor ' '-Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wunused-local-typedefs ' '-Wno-unused-parameter -Wno-unused-function ' '-Wno-unused-result -Wno-strict-overflow ' '-Wno-strict-aliasing ' '-Wno-error=deprecated-declarations ' '-Wno-stringop-overflow -Wno-psabi ' '-Wno-error=pedantic -Wno-error=redundant-decls ' '-Wno-error=old-style-cast ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Werror=cast-function-type ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n', 'Python': '3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) ' '[GCC 12.3.0]', 'TorchVision': '0.14.1+cu116', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.3+3098d78', 'sys.platform': 'linux'}
重现问题 - 代码/配置示例
新建run.bash,如下: CUDA_VISIBLE_DEVICES=1 python run.py --datasets mmlu_ppl \ --hf-path /home/data/Llama2/llama-2-7b-hf-safetensors \ --model-kwargs device_map='auto' trust_remote_code=True \ --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False trust_remote_code=True \ --max-out-len 100 \ --max-seq-len 2048 \ --batch-size 8 \ --no-batch-padding \ --num-gpus 1
重现问题 - 命令或脚本
运行了run.bash
重现问题 - 错误信息
03/14 17:32:26 - OpenCompass - INFO - Loading mmlu_ppl: configs/datasets/mmlu/mmlu_ppl.py 03/14 17:32:26 - OpenCompass - INFO - Loading example: configs/summarizers/example.py 03/14 17:32:27 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 03/14 17:32:27 - OpenCompass - INFO - Partitioned into 4 tasks. launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_0] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_law_1] on GPU 1
launch OpenICLInfer[opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_scenarios,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_miscellaneous,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_psychology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_macroeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_elementary_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_moral_disputes,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_prehistory,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_philosophy,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_biology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_nutrition,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_accounting,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_professional_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_mathematics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_clinical_knowledge,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_security_studies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_microeconomics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_world_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_conceptual_physics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_marketing,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_human_aging,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_statistics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_us_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_chemistry,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_sociology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_geography,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_government_and_politics,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_college_medicine,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_world_religions,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_virology,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_high_school_european_history,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_logical_fallacies,opencompass.models.huggingface.HuggingFace_Llama2_llama-2-7b-hf-safetensors/lukaemon_mmlu_astronomy] on GPU 1 50%|███████████████████████████████████████████████████████████████ | 2/4 [1:20:17<1:20:07, 2403.78s/it]
其他信息
您好,我的问题是,我采用LLamaFactory框架进行评测时,相同的模型、数据集,评测耗费总时间为30分。而采用open-compass需要2小时40分,耗费时间增大了很多倍,请问有办法优化评测的速度吗