Framework not specified. Using pt to export the model.
====Exporting IR=====
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 14%|█▍ | 1/7 [00:26<02:36, 26.10s/it]
Loading checkpoint shards: 29%|██▊ | 2/7 [00:48<02:00, 24.16s/it]
Loading checkpoint shards: 43%|████▎ | 3/7 [01:13<01:38, 24.57s/it]
Loading checkpoint shards: 57%|█████▋ | 4/7 [01:41<01:17, 25.85s/it]
Loading checkpoint shards: 71%|███████▏ | 5/7 [01:53<00:41, 20.73s/it]
Loading checkpoint shards: 86%|████████▌ | 6/7 [02:23<00:23, 23.81s/it]
Loading checkpoint shards: 100%|██████████| 7/7 [02:28<00:00, 17.72s/it]
Loading checkpoint shards: 100%|██████████| 7/7 [02:28<00:00, 21.20s/it]
Using framework PyTorch: 2.2.2+cpu
WARNING:root:Cannot apply model.to_bettertransformer because of the exception:
The model type chatglm is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).. Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention
Overriding 1 configuration item(s)
use_cache -> True
/home/wanglaiqi/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py:821: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (attention_mask is not None and not attention_mask.all()) or (past_key_values and seq_length != 1):
/home/wanglaiqi/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py:687: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_length:
Export model to OpenVINO directly failed with:
Couldn't get TorchScript module by tracing. With exception:
The size of tensor a (18) must match the size of tensor b (32) at non-singleton dimension 2
Please check correctness of provided 'example_input'. You can also provide TorchScript module that you obtained yourself, please refer to PyTorch documentation: https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html..
Model will be exported to ONNX
[ WARNING ] Making stateful models is not supported when exporting to ONNX as an intermediate step. A stateless model will be exported instead. It may result in sub-optimal inference performance.Provide a model that can be converted to OpenVINO without fallback to ONNX conversion path.
Using framework PyTorch: 2.2.2+cpu
Overriding 1 configuration item(s)
use_cache -> True
Traceback (most recent call last):
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/openvino/frontend/pytorch/ts_decoder.py", line 41, in init
pt_module = self._get_scripted_model(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/openvino/frontend/pytorch/ts_decoder.py", line 134, in _get_scripted_model
scripted = torch.jit.trace(
^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/jit/_trace.py", line 806, in trace
return trace_module(
^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/jit/_trace.py", line 1074, in trace_module
module._c._create_method_from_trace(
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/nncf/torch/dynamic_graph/wrappers.py", line 146, in wrapped
return module_call(self, *args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 366, in ts_patched_forward
outputs = patched_forward(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/optimum/exporters/onnx/model_patcher.py", line 152, in patched_forward
outputs = self.orig_forward(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanglaiqi/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 940, in forward
transformer_outputs = self.transformer(
^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/nncf/torch/dynamic_graph/wrappers.py", line 146, in wrapped
return module_call(self, *args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/wanglaiqi/miniconda3/envs/openvino/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanglaiqi/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 822, in forward
full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanglaiqi/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 691, in get_masks
full_attention_mask = full_attention_mask padding_mask.unsqueeze(1)
RuntimeError: The size of tensor a (18) must match the size of tensor b (32) at non-singleton dimension 2
During handling of the above exception, another exception occurred:
Framework not specified. Using pt to export the model. ====Exporting IR=====
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s] Loading checkpoint shards: 14%|█▍ | 1/7 [00:26<02:36, 26.10s/it] Loading checkpoint shards: 29%|██▊ | 2/7 [00:48<02:00, 24.16s/it] Loading checkpoint shards: 43%|████▎ | 3/7 [01:13<01:38, 24.57s/it] Loading checkpoint shards: 57%|█████▋ | 4/7 [01:41<01:17, 25.85s/it] Loading checkpoint shards: 71%|███████▏ | 5/7 [01:53<00:41, 20.73s/it] Loading checkpoint shards: 86%|████████▌ | 6/7 [02:23<00:23, 23.81s/it] Loading checkpoint shards: 100%|██████████| 7/7 [02:28<00:00, 17.72s/it] Loading checkpoint shards: 100%|██████████| 7/7 [02:28<00:00, 21.20s/it] Using framework PyTorch: 2.2.2+cpu WARNING:root:Cannot apply model.to_bettertransformer because of the exception: The model type chatglm is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).. Usage model with stateful=True may be non-effective if model does not contain torch.functional.scaled_dot_product_attention Overriding 1 configuration item(s)
During handling of the above exception, another exception occurred: