Describe the bug/ 问题描述 (Mandatory / 必填)
A clear and concise description of what the bug is.
Qwen2.5-coder-14B, 单机多卡推理报错
Hardware Environment(Ascend/GPU/CPU) / 硬件环境:
Please delete the backend not involved / 请删除不涉及的后端:
910A和910B均报错
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
Mindspore2.4, python3.9, Ubuntu20.04
启动:mpirun --bind-to numa -n 2 python qwen_infer_distributed.py
**Expected behavior / 预期结果 (Mandatory / 必填)**
A clear and concise description of what you expected to happen.
无报错输出推理结果
**Screenshots/ 日志 / 截图 (Mandatory / 必填)**
If applicable, add screenshots to help explain your problem.
[WARNING] DISTRIBUTED(1027085,ffff8ae49020,python):2024-11-25-15:26:22.431.358 [mindspore/ccsrc/distributed/collective/collective_manager.cc:384] CreateCommunicationGroup] End initialize communication group on the device side: hccl_world_group
Qwen2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`.`PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered.
Traceback (most recent call last):
File "/home/mseco/qhzhuang/qwen_infer_distributed.py", line 12, in <module>
model = AutoModelForCausalLM.from_pretrained(
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/auto/auto_factory.py", line 510, in from_pretrained
return model_class.from_pretrained(
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/modeling_utils.py", line 3126, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 666, in __init__
self.model = Qwen2Model(config)
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 472, in __init__
[Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 472, in <listcomp>
[Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 368, in __init__
self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py", line 268, in __init__
self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/core/nn/modules/linear.py", line 48, in __init__
self.reset_parameters()
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/core/nn/modules/linear.py", line 56, in reset_parameters
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
File "/home/mseco/miniconda3/envs/ms24/lib/python3.9/site-packages/mindnlp/core/nn/init.py", line 335, in _calculate_fan_in_and_fan_out
raise ValueError(
ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
--------------------------------------------------------------------------
**Additional context / 备注 (Optional / 选填)**
Add any other context about the problem here.
在华为云910B环境jupyter notebook报错:
Qwen2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`.`PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 2
1 REPO_ID = "./Qwen2.5-Coder-14B"
----> 2 model = AutoModelForCausalLM.from_pretrained(REPO_ID, ms_dtype=ms.float16, device_map="auto")
3 tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/auto/auto_factory.py:510, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
508 if type(config) in cls._model_mapping.keys():
509 model_class = _get_model_class(config, cls._model_mapping)
--> 510 return model_class.from_pretrained(
511 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
512 )
513 raise ValueError(
514 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
515 f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
516 )
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/modeling_utils.py:3126, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3123 model_kwargs.pop('mirror', None)
3124 with ContextManagers(init_contexts):
3125 # Let's make sure we don't run the init function of buffer modules
-> 3126 model = cls(config, *model_args, **model_kwargs)
3127 # make sure we use the model's config since the __init__ call might have copied it
3128 config = model.config
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:666, in Qwen2ForCausalLM.__init__(self, config)
664 def __init__(self, config):
665 super().__init__(config)
--> 666 self.model = Qwen2Model(config)
667 self.vocab_size = config.vocab_size
668 self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:472, in Qwen2Model.__init__(self, config)
468 self.vocab_size = config.vocab_size
470 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
471 self.layers = nn.ModuleList(
--> 472 [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
473 )
474 self._attn_implementation = config._attn_implementation
475 self.norm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:472, in <listcomp>(.0)
468 self.vocab_size = config.vocab_size
470 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
471 self.layers = nn.ModuleList(
--> 472 [Qwen2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
473 )
474 self._attn_implementation = config._attn_implementation
475 self.norm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:368, in Qwen2DecoderLayer.__init__(self, config, layer_idx)
363 if config.sliding_window and config._attn_implementation != "flash_attention_2":
364 logger.warning_once(
365 f"Sliding Window Attention is enabled but not implemented for `{config._attn_implementation}`; "
366 "unexpected results may be encountered."
367 )
--> 368 self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
370 self.mlp = Qwen2MLP(config)
371 self.input_layernorm = Qwen2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
File ~/.local/lib/python3.10/site-packages/mindnlp/transformers/models/qwen2/modeling_qwen2.py:268, in Qwen2Attention.__init__(self, config, layer_idx)
263 if (self.head_dim * self.num_heads) != self.hidden_size:
264 raise ValueError(
265 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
266 f" and `num_heads`: {self.num_heads})."
267 )
--> 268 self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
269 self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
270 self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
File ~/.local/lib/python3.10/site-packages/mindnlp/core/nn/modules/linear.py:48, in Linear.__init__(self, in_features, out_features, bias, dtype)
45 else:
46 self.register_parameter('bias', None)
---> 48 self.reset_parameters()
File ~/.local/lib/python3.10/site-packages/mindnlp/core/nn/modules/linear.py:56, in Linear.reset_parameters(self)
54 init.kaiming_uniform_(self.weight, a=math.sqrt(5))
55 if self.bias is not None:
---> 56 fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
57 bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
58 init.uniform_(self.bias, -bound, bound)
File ~/.local/lib/python3.10/site-packages/mindnlp/core/nn/init.py:335, in _calculate_fan_in_and_fan_out(tensor)
333 dimensions = tensor.ndim
334 if dimensions < 2:
--> 335 raise ValueError(
336 "Fan in and fan out can not be computed for tensor with fewer than 2 dimensions"
337 )
339 num_input_fmaps = tensor.shape[1]
340 num_output_fmaps = tensor.shape[0]
ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
Describe the bug/ 问题描述 (Mandatory / 必填) A clear and concise description of what the bug is. Qwen2.5-coder-14B, 单机多卡推理报错
Ascend
/GPU
/CPU
) / 硬件环境:PyNative
/Graph
):init()
REPO_ID = "/home/mseco/qhzhuang/qwen32B"
model = AutoModelForCausalLM.from_pretrained( REPO_ID, ms_dtype=ms.float16, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(REPO_ID) print("--------- Init Model --------") conversation = []
message = "who are you" conversation.append({"role": "user", "content": message}) input_ids = tokenizer.apply_chat_template( conversation, add_generation_prompt=True, return_tensors="ms")
print(input_ids) print(f"model type: {type(model)}") sample_output = model.generate(input_ids, max_new_tokens=100, do_sample=True, top_p=0.95, top_k=50, temperature=0.7, repetition_penalty=1.0, num_beams=1) response = sample_output[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) print("---- Init Model Finished ----")