Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem?

JameslaoA commented 1 month ago

System Info

transformers version : 4.38.1 platform: ubuntu 22.04 python version : 3.10.14 optimum version : 1.19.2

Who can help?

@ArthurZucker and @younesbelkada

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

1.reference conversion command link: https://huggingface.co/docs/transformers/v4.40.1/zh/serialization 2.download model files offline (https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat/tree/main) 3.Execute transition instruction：optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/

The conversion results are as follows： (mypy3.10_qnn) zhengjr@ubuntu-ThinkStation-P3-Tower:~$ optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/ 2024-05-15 19:42:07.726433: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-15 19:42:07.916257: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-15 19:42:07.997974: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-15 19:42:08.545959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2024-05-15 19:42:08.546100: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2024-05-15 19:42:08.546104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Framework not specified. Using pt to export the model. The task text-generation was manually specified, and past key values will not be reused in the decoding. if needed, please pass --task text-generation-with-past to export using the past key values. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Using the export variant default. Available variants are:

default: The default ONNX variant. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Exporting submodel 1/1: Qwen2ForCausalLM Using framework PyTorch: 1.13.1 Overriding 1 configuration item(s)

use_cache -> False /home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal: /home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py:300: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:126: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if seq_len > self.max_seq_len_cached: /home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len): /home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): /home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:309: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): Saving external data to one file... Post-processing the exported models... Deduplicating shared (tied) weights... Found different candidate ONNX initializers (likely duplicate) for the tied weights: lm_head.weight: {'onnx::MatMul_5535'} model.embed_tokens.weight: {'model.embed_tokens.weight'} Removing duplicate initializer onnx::MatMul_5535...

Validating ONNX model Qwen1.5-0.5B-Chat_onnx/model.onnx... -[✓] ONNX model output names match reference model (logits)

Validating ONNX Model output "logits": -[✓] (2, 16, 151936) matches (2, 16, 151936) -[x] values not close enough, max diff: 5.143880844116211e-05 (atol: 1e-05) The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 5.143880844116211e-05. The exported model was saved at: Qwen1.5-0.5B-Chat_onnx

Expected behavior

I expect the input and output tensor type converted to the onnx model to be fp16 again.

younesbelkada commented 1 month ago

cc @fxmarty @michaelbenayoun for optimum

JameslaoA commented 1 month ago

@younesbelkada thanks for your help. @fxmarty @michaelbenayoun pls help to confirm the issue,thanks.

JameslaoA commented 1 month ago

@fxmarty @michaelbenayoun can you hlep to confirm the issue？

Thanks.

JameslaoA commented 1 month ago

@younesbelkada can you help to contact the @fxmarty @michaelbenayoun to help the issue?

Thanks.

michaelbenayoun commented 1 month ago

Hi @JameslaoA , what's the issue exactly? When conversion is done the model seems to run nicely and have logits matching the original model. The inputs should be int64 no? And the outputs are int64? Are you sure? The logits seem to be computed well.

JameslaoA commented 1 month ago

Hi @michaelbenayoun thanks you for your response.

Through the model.onnx model after optimum cli conversion, open it with the netron.app tool, I see the input is int64 and the output is fp32,pls see the screenshot below

I used the following script to convert onnx to qnn lib, but the compiler failed because int64 was not supported.

Could you please help me confirm whether the reason for the problem is that the output format converted is not supported, or what is the reason? I have also raised questions to Qualcomm and requested their help.

Thanks.

JameslaoA commented 1 month ago

Hello @michaelbenayoun I expect the converted onnx model inputs to be int8/int16/int32 rather than int64.

can you help me to resolves it?

Thanks.

michaelbenayoun commented 1 month ago

So it seems that everything is fine on the export side. But you need to have int32 inputs instead of int64. I think this script could help you.

JameslaoA commented 1 month ago

Hi @michaelbenayoun thanks for your help,I'll try to verify that.

Thanks.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers