[Bug] 部分huggingface模型加载错误

daidaiershidi commented 1 month ago

先决条件

[X] 我已经搜索过问题和讨论但未得到预期的帮助。
[X] 错误在最新版本中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
 'CUDA_HOME': '/usr/local/cuda',
 'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)',
 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA L20',
 'MMEngine': '0.10.4',
 'MUSA available': False,
 'NVCC': 'Cuda compilation tools, release 12.4, V12.4.131',
 'OpenCV': '4.10.0',
 'PyTorch': '2.3.1+cu121',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2022.2-Product Build 20220804 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.3.6 (Git Hash '
                              '86e6af5974177e513fd3fee58425e1063e7f1361)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX512\n'
                              '  - CUDA Runtime 12.1\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
                              '  - CuDNN 8.9.2\n'
                              '  - Magma 2.6.1\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
                              'CUDNN_VERSION=8.9.2, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wsuggest-override '
                              '-Wno-psabi -Wno-error=pedantic '
                              '-Wno-error=old-style-cast -Wno-missing-braces '
                              '-fdiagnostics-color=always -faligned-new '
                              '-Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, '
                              'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, '
                              'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
                              'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, '
                              'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, '
                              'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
                              'USE_ROCM_KERNEL_ASSERT=OFF, \n',
 'Python': '3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]',
 'TorchVision': '0.18.1+cu121',
 'numpy_random_seed': 2147483648,
 'opencompass': '0.2.6+a62c613',
 'sys.platform': 'linux'}

重现问题 - 代码/配置示例

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python run.py --datasets mmlu_gen     --hf-type chat     --hf-path WizardLMTeam/WizardLM-13B-V1.2      -a vllm   \
  --max-out-len 2     --hf-num-gpus 8     --batch-size 1     --max-workers-per-gpu 1     --max-num-workers 8     -w output/debug-mmlu_gen  \
  --generation-kwargs do_sample=False temperature=0.0

重现问题 - 命令或脚本

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python run.py --datasets mmlu_gen     --hf-type chat     --hf-path WizardLMTeam/WizardLM-13B-V1.2      -a vllm   \
  --max-out-len 2     --hf-num-gpus 8     --batch-size 1     --max-workers-per-gpu 1     --max-num-workers 8     -w output/debug-mmlu_gen  \
  --generation-kwargs do_sample=False temperature=0.0

重现问题 - 错误信息

问题：RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:7! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

报错：

09/21 14:02:05 - OpenCompass - INFO - Task [WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_biology_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_chemistry_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_computer_science_2,WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_mathematics_3,WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_physics_4,WizardLM-13B-V1.2_hf/lukaemon_mmlu_electrical_engineering_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_astronomy_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_anatomy_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_abstract_algebra_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_machine_learning_6,WizardLM-13B-V1.2_hf/lukaemon_mmlu_clinical_knowledge_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_management_0,WizardLM-13B-V1.2_hf/lukaemon_mmlu_nutrition_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_marketing_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_professional_accounting_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_geography_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_international_law_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_moral_scenarios_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_computer_security_1,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_microeconomics_2,WizardLM-13B-V1.2_hf/lukaemon_mmlu_professional_law_2,WizardLM-13B-V1.2_hf/lukaemon_mmlu_medical_genetics_2,WizardLM-13B-V1.2_hf/lukaemon_mmlu_professional_psychology_3,WizardLM-13B-V1.2_hf/lukaemon_mmlu_jurisprudence_3,WizardLM-13B-V1.2_hf/lukaemon_mmlu_world_religions_4,WizardLM-13B-V1.2_hf/lukaemon_mmlu_philosophy_4,WizardLM-13B-V1.2_hf/lukaemon_mmlu_virology_4,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_chemistry_4,WizardLM-13B-V1.2_hf/lukaemon_mmlu_public_relations_4,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_macroeconomics_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_human_sexuality_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_elementary_mathematics_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_physics_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_computer_science_5,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_european_history_6,WizardLM-13B-V1.2_hf/lukaemon_mmlu_business_ethics_6,WizardLM-13B-V1.2_hf/lukaemon_mmlu_moral_disputes_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_statistics_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_miscellaneous_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_formal_logic_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_government_and_politics_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_prehistory_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_security_studies_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_biology_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_logical_fallacies_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_world_history_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_professional_medicine_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_mathematics_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_medicine_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_us_history_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_sociology_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_econometrics_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_high_school_psychology_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_human_aging_7,WizardLM-13B-V1.2_hf/lukaemon_mmlu_conceptual_physics_0]
WARNING 09-21 14:02:06 _custom_ops.py:14] Failed to import from vllm._C with ImportError("/lib64/libc.so.6: version `GLIBC_2.32' not found (required by /home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/vllm/_C.abi3.so)")
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
09/21 14:02:17 - OpenCompass - INFO - using stop words: ['</s>']
09/21 14:02:18 - OpenCompass - INFO - Start inferencing [WizardLM-13B-V1.2_hf/lukaemon_mmlu_college_biology_1]

  0%|          | 0/18 [00:00<?, ?it/s]
100%|██████████| 18/18 [00:00<00:00, 786432.00it/s]
[2024-09-21 14:02:18,641] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...

  0%|          | 0/18 [00:00<?, ?it/s]No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:540: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:545: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(

  0%|          | 0/18 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/tasks/openicl_infer.py", line 161, in <module>
    inferencer.run()
  File "/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/tasks/openicl_infer.py", line 89, in run
    self._inference()
  File "/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/tasks/openicl_infer.py", line 128, in _inference
    inferencer.inference(retriever,
  File "/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 152, in inference
    results = self.model.generate_from_template(
  File "/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/models/base.py", line 165, in generate_from_template
    return self.generate(inputs, max_out_len=max_out_len, **kwargs)
  File "/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/models/huggingface_above_v4_33.py", line 287, in generate
    outputs = self.model.generate(**tokens, **generation_kwargs)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in _sample
    outputs = self(
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in forward
    logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)]
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in <listcomp>
    logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:7! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
E0921 14:02:24.405000 139672313554752 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 71628) of binary: /home/asfhalshd/miniconda3/envs/opencompass/bin/python
Traceback (most recent call last):
  File "/home/asfhalshd/miniconda3/envs/opencompass/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/mnt/data-disk0/asfhalshd/project/opencompass/opencompass/tasks/openicl_infer.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-09-21_14:02:24
  host      : iZuf687klu435an8wah10tZ
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 71628)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

### 其他信息

1. 你是否对代码或配置文件做了任何改动？ 否
2. 你认为可能的原因是什么？ 权重加载错误

tonysy commented 4 weeks ago

Please ensure that this model can be loaded with transformers successfully.

daidaiershidi commented 4 weeks ago

Please ensure that this model can be loaded with transformers successfully.

使用 transformers 加载没有问题。在opencompass上，我换成只用6张卡加载也没有问题

tonysy commented 3 weeks ago

“--hf-num-gpus 8” indicates the utilization of tensor parallelism for a single model. For a 13B model, “--hf-num-gpus 2” might suffice. In case one desires to employ data parallelism, the “--max-num-worker” parameter can be utilized.

tonysy commented 3 weeks ago

The error File "/home/asfhalshd/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in <listcomp> logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:7! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

appears to be unrelated to OpenCompass but rather pertains to Transformers.

tonysy commented 3 weeks ago

Feel free to re-open if needed.

open-compass / opencompass