mindspore-lab / mindnlp

Easy-to-use and high-performance NLP and LLM framework based on MindSpore, compatible with models and datasets of 🤗Huggingface.
https://mindnlp.cqu.ai/
Apache License 2.0
708 stars 201 forks source link

Qwen2.5-coder-14B单机多卡推理报错 #1847

Open qhzhuang opened 4 days ago

qhzhuang commented 4 days ago

Describe the bug/ 问题描述 (Mandatory / 必填) A clear and concise description of what the bug is. Qwen2.5-coder-14B, 单机多卡推理报错

init()

REPO_ID = "/home/mseco/qhzhuang/qwen32B"

model = AutoModelForCausalLM.from_pretrained( REPO_ID, ms_dtype=ms.float16, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(REPO_ID) print("--------- Init Model --------") conversation = []

message = "who are you" conversation.append({"role": "user", "content": message}) input_ids = tokenizer.apply_chat_template( conversation, add_generation_prompt=True, return_tensors="ms")

print(input_ids) print(f"model type: {type(model)}") sample_output = model.generate(input_ids, max_new_tokens=100, do_sample=True, top_p=0.95, top_k=50, temperature=0.7, repetition_penalty=1.0, num_beams=1) response = sample_output[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) print("---- Init Model Finished ----")


启动:mpirun --bind-to numa -n 2 python qwen_infer_distributed.py 

**Expected behavior / 预期结果 (Mandatory / 必填)**
A clear and concise description of what you expected to happen.
无报错输出推理结果
**Screenshots/ 日志 / 截图 (Mandatory / 必填)**
If applicable, add screenshots to help explain your problem.
[WARNING] DISTRIBUTED(1027085,ffff8ae49020,python):2024-11-25-15:26:22.431.358 [mindspore/ccsrc/distributed/collective/collective_manager.cc:384] CreateCommunicationGroup] End initialize communication group on the device side: hccl_world_group
Qwen2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`.`PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered.
Traceback (most recent call last):
  File "/home/mseco/qhzhuang/qwen_infer_distributed.py", line 12, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/auto/auto_factory.py", line 510, in from_pretrained
    return model_class.from_pretrained(
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/modeling_utils.py", line 3126, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 666, in __init__
    self.model = Qwen2Model(config)
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 472, in __init__
    [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 472, in <listcomp>
    [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 368, in __init__
    self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 268, in __init__
    self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/core/nn/modules/linear.py", line 48, in __init__
    self.reset_parameters()
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/core/nn/modules/linear.py", line 56, in reset_parameters
    fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
  File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/core/nn/init.py", line 335, in _calculate_fan_in_and_fan_out
    raise ValueError(
ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
--------------------------------------------------------------------------
**Additional context / 备注 (Optional / 选填)**
Add any other context about the problem here.
在华为云910B环境jupyter notebook报错:
Qwen2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`.`PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 2
      1 REPO_ID = "./Qwen2.5-Coder-14B"
----> 2 model = AutoModelForCausalLM.from_pretrained(REPO_ID, ms_dtype=ms.float16, device_map="auto")
      3 tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/auto/auto_factory.py:510, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    508 if type(config) in cls._model_mapping.keys():
    509     model_class = _get_model_class(config, cls._model_mapping)
--> 510     return model_class.from_pretrained(
    511         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    512     )
    513 raise ValueError(
    514     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    515     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    516 )

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/modeling_utils.py:3126, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3123 model_kwargs.pop('mirror', None)
   3124 with ContextManagers(init_contexts):
   3125     # Let's make sure we don't run the init function of buffer modules
-> 3126     model = cls(config, *model_args, **model_kwargs)
   3127 # make sure we use the model's config since the __init__ call might have copied it
   3128 config = model.config

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:666, in Qwen2ForCausalLM.__init__(self, config)
    664 def __init__(self, config):
    665     super().__init__(config)
--> 666     self.model = Qwen2Model(config)
    667     self.vocab_size = config.vocab_size
    668     self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:472, in Qwen2Model.__init__(self, config)
    468 self.vocab_size = config.vocab_size
    470 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
    471 self.layers = nn.ModuleList(
--> 472     [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
    473 )
    474 self._attn_implementation = config._attn_implementation
    475 self.norm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:472, in <listcomp>(.0)
    468 self.vocab_size = config.vocab_size
    470 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
    471 self.layers = nn.ModuleList(
--> 472     [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
    473 )
    474 self._attn_implementation = config._attn_implementation
    475 self.norm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:368, in Qwen2DecoderLayer.__init__(self, config, layer_idx)
    363 if config.sliding_window and config._attn_implementation != "flash_attention_2":
    364     logger.warning_once(
    365         f"Sliding Window Attention is enabled but not implemented for `{config._attn_implementation}`; "
    366         "unexpected results may be encountered."
    367     )
--> 368 self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
    370 self.mlp = Qwen2MLP(config)
    371 self.input_layernorm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:268, in Qwen2Attention.__init__(self, config, layer_idx)
    263 if (self.head_dim * self.num_heads) != self.hidden_size:
    264     raise ValueError(
    265         f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
    266         f" and `num_heads`: {self.num_heads})."
    267     )
--> 268 self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
    269 self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
    270 self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)

File ~/.local/lib/python3.10/site-packages/mindnlp/core/nn/modules/linear.py:48, in Linear.__init__(self, in_features, out_features, bias, dtype)
     45 else:
     46     self.register_parameter('bias', None)
---> 48 self.reset_parameters()

File ~/.local/lib/python3.10/site-packages/mindnlp/core/nn/modules/linear.py:56, in Linear.reset_parameters(self)
     54 init.kaiming_uniform_(self.weight, a=math.sqrt(5))
     55 if self.bias is not None:
---> 56     fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
     57     bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
     58     init.uniform_(self.bias, -bound, bound)

File ~/.local/lib/python3.10/site-packages/mindnlp/core/nn/init.py:335, in _calculate_fan_in_and_fan_out(tensor)
    333 dimensions = tensor.ndim
    334 if dimensions < 2:
--> 335     raise ValueError(
    336         "Fan in and fan out can not be computed for tensor with fewer than 2 dimensions"
    337     )
    339 num_input_fmaps = tensor.shape[1]
    340 num_output_fmaps = tensor.shape[0]

ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
qhzhuang commented 4 days ago

补充mindnlp版本, 从源码安装, 安装时最新commit_id为: 5ce1a8367e25594c209a8dde3d00e10facd0d452 Author: nate.river lvyufeng@cqu.edu.cn Date: Fri Nov 22 17:27:02 2024 +0800