luchangli03 / export_llama_to_onnx

export llama to onnx
MIT License
99 stars 11 forks source link

请问Qwen转换出错问题:RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 28 but got size 4 for tensor number 1 in the list. #17

Open yanxiao1930 opened 4 months ago

yanxiao1930 commented 4 months ago

python export_qwen2_1.5.py -m /media/yanxiao/机械硬盘1/LLM/Qwen2-7B-Instruct -o ./ WARNING:root:*** Note: please apply modications to model before conversion: modication 1: in Qwen2ForCausalLM.forward hidden_states = outputs[0] hidden_states = hidden_states[:,-1:,:] # <<-- logits = self.lm_head(hidden_states) modication 2: in Qwen2Model.forward ''' if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache: is_padding_right = attention_mask[:, -1].sum().item() != batch_size if is_padding_right: raise ValueError( "You are attempting to perform batched generation with padding_side='right'" " this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to " " call tokenizer.padding_side = 'left' before tokenizing the input. " ) if self._attn_implementation == "flash_attention_2":

2d mask is passed through the layers

        attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None
    elif self._attn_implementation == "sdpa" and not output_attentions:
        # output_attentions=True can not be supported when using SDPA, and we fall back on
        # the manual implementation that requires a 4D causal mask in all cases.
        attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
            attention_mask,
            (batch_size, seq_length),
            inputs_embeds,
            past_key_values_length,
        )
    else:
        # 4d mask is passed through the layers
        attention_mask = _prepare_4d_causal_attention_mask(
            attention_mask,
            (batch_size, seq_length),
            inputs_embeds,
            past_key_values_length,
            sliding_window=self.config.sliding_window,
        )
    '''

modication 3: in Qwen2RotaryEmbedding.forward return (self.cos_cached.to(dtype=x.dtype), self.sin_cached.to(dtype=x.dtype))

return (

    #     self.cos_cached[:seq_len].to(dtype=x.dtype),
    #     self.sin_cached[:seq_len].to(dtype=x.dtype),
    # )

begin load model from /media/yanxiao/机械硬盘1/LLM/Qwen2-7B-Instruct Loading checkpoint shards: 100%|██████████████████| 4/4 [00:11<00:00, 2.96s/it] finish load model from /media/yanxiao/机械硬盘1/LLM/Qwen2-7B-Instruct config: Qwen2Config { "_name_or_path": "/media/yanxiao/\u673a\u68b0\u786c\u76d81/LLM/Qwen2-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.42.3", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 }

begin export qwen We detected that you are passing past_key_values as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate Cache class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) /home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py:1109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attention_mask.max() != 0: /home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py:128: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if seq_len > self.max_seq_len_cached: Traceback (most recent call last): File "/media/yanxiao/win_software/Python_code/project/export_llama_to_onnx/export_qwen2_1.5.py", line 194, in export_qwen(args) File "/media/yanxiao/win_software/Python_code/project/export_llama_to_onnx/export_qwen2_1.5.py", line 127, in export_qwen export_qwen_to_single_onnx(model, config, dtype, args, "qwen_onnx") File "/media/yanxiao/win_software/Python_code/project/export_llama_to_onnx/export_qwen2_1.5.py", line 95, in export_qwen_to_single_onnx torch.onnx.export( File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/onnx/utils.py", line 516, in export _export( File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/onnx/utils.py", line 1612, in _export graph, params_dict, torch_out = _model_to_graph( ^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/onnx/utils.py", line 1134, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/onnx/utils.py", line 1010, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/onnx/utils.py", line 914, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/jit/_trace.py", line 1310, in _get_trace_graph outs = ONNXTracedModule( ^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/jit/_trace.py", line 138, in forward graph, out = torch._C._create_graph_by_tracing( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/jit/_trace.py", line 129, in wrapper outs.append(self.inner(trace_inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/yanxiao/win_software/Python_code/project/export_llama_to_onnx/export_qwen2_1.5.py", line 23, in forward outputs = self.model( ^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1221, in forward outputs = self.model( ^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1023, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 763, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 667, in forward key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yanxiao/anaconda3/lib/python3.11/site-packages/transformers/cache_utils.py", line 363, in update self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 28 but got size 4 for tensor number 1 in the list.

YancyMA commented 3 months ago

我也遇到这个问题,请问你解决了吗

hammoudhasan commented 3 months ago

This might be an issue related to HuggingFace transformers library - I'm having same error in a different setting.

yanxiao1930 commented 3 months ago

我也遇到这个问题,请问你解决了吗

没有解决

luchangli03 commented 3 months ago

首先要根据代码的说明对transformers里面的qwen模型定义做一些修改哈。

ivanstepanovftw commented 1 month ago

Some modifications are outdated, i.e. I cannot find modication 2: in Qwen2Model.forward, and modication 3: in Qwen2RotaryEmbedding.forward is already performed in the original code.