Closed xiaobai52HZ closed 11 months ago
请说明一下使用的是哪个模型,导出方式是什么?
看起来你的错误是一个If
节点导致,此项目中导出的onnx都已经将unsqueeze优化了,应该不会有 If
节点了
llama2模型,直接用的trtexec工具转的,怎么会没有if节点呢?我这边输出还能看到if节点
---Original--- From: @.> Date: Sat, Oct 7, 2023 17:36 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
请说明一下使用的是哪个模型,导出方式是什么?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
而且我还尝试过用transformer的optimum做hf到onnx的转换,然后从onnx转换成trt也是这个错误
---Original--- From: @.> Date: Sat, Oct 7, 2023 17:40 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
看起来你的错误是一个If节点导致,此项目中导出的onnx都已经将unsqueeze优化了,应该不会有 If 节点了
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
llama2 的源码没有修改,我优化一下
哇,感谢😊👍👍👍👍太强了😳😳😳期待😳😳请问是要修改llama源码还是onnx转化代码😳?
---Original--- From: @.> Date: Sat, Oct 7, 2023 17:58 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
llama2 的源码没有修改,我优化一下
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
llama2 的源码没有修改,我优化一下
修改llama源码,这个if来源是squeeze算子引入的
哦哦,了解了,大概什么时候更新好呀😳迫不及待想试一下了😳是不是您项目里其他模型也修改了源码呀?我朋友用的您这个chatglm转换就没这个问题
---Original--- From: @.> Date: Sat, Oct 7, 2023 18:10 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
llama2 的源码没有修改,我优化一下
修改llama源码,这个if来源是squeeze算子引入的
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
已经修改了 是的 别的都有修改 之前没有修改llama的
------------------ Original ------------------ From: xiaobai52HZ @.> Date: Sat,Oct 7,2023 6:16 PM To: wangzhaode/llm-export @.> Cc: 王召德 @.>, Assign @.> Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
哦哦,了解了,大概什么时候更新好呀😳迫不及待想试一下了😳是不是您项目里其他模型也修改了源码呀?我朋友用的您这个chatglm转换就没这个问题
---Original--- From: @.> Date: Sat, Oct 7, 2023 18:10 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
llama2 的源码没有修改,我优化一下
修改llama源码,这个if来源是squeeze算子引入的
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.>
嗯嗯,太强了,感谢,我试试
---Original--- From: @.> Date: Sat, Oct 7, 2023 18:18 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
已经修改了 是的 别的都有修改 之前没有修改llama的
------------------ Original ------------------ From: xiaobai52HZ @.> Date: Sat,Oct 7,2023 6:16 PM To: wangzhaode/llm-export @.> Cc: 王召德 @.>, Assign @.> Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
哦哦,了解了,大概什么时候更新好呀😳迫不及待想试一下了😳是不是您项目里其他模型也修改了源码呀?我朋友用的您这个chatglm转换就没这个问题
---Original---
From: @.>
Date: Sat, Oct 7, 2023 18:10 PM
To: @.>;
Cc: @.**@.>;
Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
llama2 的源码没有修改,我优化一下
修改llama源码,这个if来源是squeeze算子引入的
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were assigned.Message ID: @.>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.***>
好的 问题解决的话可以回复一下或者给个✨
嗯,今天环境有点问题,明天试一下,已经点啦✨
[10/09/2023-08:13:15] [I] [TRT] ---------------------------------------------------------------- [10/09/2023-08:14:00] [W] [TRT] onnx2trtutils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:771: While parsing node number 82 [Squeeze -> "/blocks.0/self_attn/Squeeze_output0"]: [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:772: --- Begin node --- [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:773: input: "/blocks.0/self_attn/rotary_emb/Cast_output0" output: "/blocks.0/self_attn/Squeeze_output0" name: "/blocks.0/self_attn/Squeeze" op_type: "Squeeze"
[10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:774: --- End node --- [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:4773 In function importSqueeze: [8] Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set." [10/09/2023-08:14:00] [E] Failed to parse onnx file [10/09/2023-08:14:01] [I] Finished parsing network model. Parse time: 45.9346 [10/09/2023-08:14:01] [E] Parsing model failed [10/09/2023-08:14:01] [E] Failed to create engine from model or file. [10/09/2023-08:14:01] [E] Engine set up failed
trtexec转换的时候出现了上述问题 我查看四个输入的维度信息是这样的: input_ids: seq_len, attention_mask: 1, 1, seq_len, seq_len, position_ids: seq_len, 3, past_key_values: 32, 2, 1, 32, history_len, 128, token_id: 1, presents: 32, 2, 1, 32, Concatpresents_dim_4, 128,
在config.pbtxt文件中是这样显示的, "input": [{ "name": "past_key_values", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [32, 2, 1, 32, -1, 128], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "position_ids", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [-1, 3], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "attention_mask", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [1, 1, -1, -1], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "input_ids", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [-1], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }], position_ids还是有问题吧?
我看了一下chatglm的position_ids维度信息是"dims": [-1],
改了position_ids的动态维度也还是这个问题 input_ids: seq_len, attention_mask: 1, 1, seq_len, target_seq_len, position_ids: 1, seq_len, past_key_values: 32, 2, 1, 32, history_len, 128, token_id: 1, presents: 32, 2, 1, 32, Concatpresents_dim_4, 128,
"name": "position_ids", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [1, -1],
[10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:771: While parsing node number 82 [Squeeze -> "/blocks_.0/self_attn/Squeeze_output0"]: [10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:772: --- Begin node --- [10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:773: input: "/blocks.0/self_attn/rotary_emb/Cast_output0" output: "/blocks.0/self_attn/Squeeze_output0" name: "/blocks.0/self_attn/Squeeze" op_type: "Squeeze"
[10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:774: --- End node --- [10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:4773 In function importSqueeze: [8] Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set." [10/09/2023-08:57:35] [E] Failed to parse onnx file [10/09/2023-08:57:36] [I] Finished parsing network model. Parse time: 45.6445 [10/09/2023-08:57:36] [E] Parsing model failed [10/09/2023-08:57:36] [E] Failed to create engine from model or file. [10/09/2023-08:57:36] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=./model.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1,attention_mask:1x1x1x2,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x0x128 --maxShapes=input_ids:128,attention_mask:1x1x128x256,position_ids:1x128,past_key_values:32x2x1x32x128x128 --device=1 --fp16
Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set."
这个不是导出模型的问题了,TRT不支持squeeze动态形状
position_ids的shape是没问题的,不需要修改的
那我这样改了以后,onnx也能推理,这样改了有问题吗?
这个squeeze算子不支持,我是不是需要改源码,把所有的squeeze换成reshape,还是 只需要吧position_ids的squeeze换成reshape呀
position_ids维度是[-1,3],这很奇怪吧?3是seq_len的长度
---Original--- From: @.> Date: Mon, Oct 9, 2023 17:20 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
position_ids的shape是没问题的,不需要修改的
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
大佬,这个能怎么改可以转trt啊😭
---Original--- From: @.> Date: Mon, Oct 9, 2023 17:19 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set."
这个不是导出模型的问题了,TRT不支持squeeze动态形状
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
我没有测试过trt 这个你来尝试修改一下吧, 这2行 尝试替换成reshape或者view再试试
好,我试试
---Original--- From: @.> Date: Mon, Oct 9, 2023 19:19 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
我没有测试过trt 这个你来尝试修改一下吧, 这2行 尝试替换成reshape或者view再试试
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
大佬,您有时间在看一下这大概是什么问题,改了https://github.com/wangzhaode/llm-export/blob/55154487f32f423546057b5c60f1dbed63077eee/llm_models/Llama-2-7b-chat-ms/modeling_llama.py#L183C5-L184C42后,没有那个错误信息了,出来一个新的错误信息, 您看有什么思路改改吗?
大佬,改完之后可以了,想请问这个转换适合llama非chat模型和llama变体模型吗?
大佬,改完之后可以了,想请问这个转换适合llama非chat模型和llama变体模型吗?
适合的 llm基础的模型结构都适用的,另外你的改动是否可以pr一下?
我提交了,您看看有问题没
还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢
还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢
可以在代码里修改,导出时打开verbose 看一下int64是哪里出现的,然后把dtype都改成int32就行了
好滴😊
---Original--- From: @.> Date: Thu, Oct 12, 2023 11:33 AM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢
可以在代码里修改,导出时打开verbose 看一下int64是哪里出现的,然后把dtype都改成int32就行了
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
大佬,您做过peft模型(ptuning)和预训练模型(llama)的合并工作吗?就是合并成一个模型,有啥思路可以提供的嘛 😭 我自己用peft的ptuning微调了llama,但是合并不了,因为llama没有promptembedding层
---Original--- From: @.> Date: Thu, Oct 12, 2023 11:33 AM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)
还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢
可以在代码里修改,导出时打开verbose 看一下int64是哪里出现的,然后把dtype都改成int32就行了
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
root@1d434e9d1113:/models/onnx# trtexec --onnx=./llm.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --maxShapes=input_ids:128x1,attention_mask:1x1x128x128,position_ids:128x1,past_key_values:32x2x1x32x128x128 &&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=./llm.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --maxShapes=input_ids:128x1,attention_mask:1x1x128x128,position_ids:128x1,past_key_values:32x2x1x32x128x128 [09/27/2023-03:32:55] [I] === Model Options === [09/27/2023-03:32:55] [I] Format: ONNX [09/27/2023-03:32:55] [I] Model: ./llm.onnx [09/27/2023-03:32:55] [I] Output: [09/27/2023-03:32:55] [I] === Build Options === [09/27/2023-03:32:55] [I] Max batch: explicit batch [09/27/2023-03:32:55] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/27/2023-03:32:55] [I] minTiming: 1 [09/27/2023-03:32:55] [I] avgTiming: 8 [09/27/2023-03:32:55] [I] Precision: FP32 [09/27/2023-03:32:55] [I] LayerPrecisions: [09/27/2023-03:32:55] [I] Layer Device Types: [09/27/2023-03:32:55] [I] Calibration: [09/27/2023-03:32:55] [I] Refit: Disabled [09/27/2023-03:32:55] [I] Version Compatible: Disabled [09/27/2023-03:32:55] [I] TensorRT runtime: full [09/27/2023-03:32:55] [I] Lean DLL Path: [09/27/2023-03:32:55] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [09/27/2023-03:32:55] [I] Exclude Lean Runtime: Disabled [09/27/2023-03:32:55] [I] Sparsity: Disabled [09/27/2023-03:32:55] [I] Safe mode: Disabled [09/27/2023-03:32:55] [I] Build DLA standalone loadable: Disabled [09/27/2023-03:32:55] [I] Allow GPU fallback for DLA: Disabled [09/27/2023-03:32:55] [I] DirectIO mode: Disabled [09/27/2023-03:32:55] [I] Restricted mode: Disabled [09/27/2023-03:32:55] [I] Skip inference: Disabled [09/27/2023-03:32:55] [I] Save engine: ./trt/model.plan [09/27/2023-03:32:55] [I] Load engine: [09/27/2023-03:32:55] [I] Profiling verbosity: 0 [09/27/2023-03:32:55] [I] Tactic sources: Using default tactic sources [09/27/2023-03:32:55] [I] timingCacheMode: local [09/27/2023-03:32:55] [I] timingCacheFile: [09/27/2023-03:32:55] [I] Heuristic: Disabled [09/27/2023-03:32:55] [I] Preview Features: Use default preview flags. [09/27/2023-03:32:55] [I] MaxAuxStreams: -1 [09/27/2023-03:32:55] [I] BuilderOptimizationLevel: -1 [09/27/2023-03:32:55] [I] Input(s)s format: fp32:CHW [09/27/2023-03:32:55] [I] Output(s)s format: fp32:CHW [09/27/2023-03:32:55] [I] Input build shape: input_ids=1x1+1x1+128x1 [09/27/2023-03:32:55] [I] Input build shape: attention_mask=1x1x1x1+1x1x1x1+1x1x128x128 [09/27/2023-03:32:55] [I] Input build shape: position_ids=1x1+1x1+128x1 [09/27/2023-03:32:55] [I] Input build shape: past_key_values=32x2x1x32x1x128+32x2x1x32x1x128+32x2x1x32x128x128 [09/27/2023-03:32:55] [I] Input calibration shapes: model [09/27/2023-03:32:55] [I] === System Options === [09/27/2023-03:32:55] [I] Device: 0 [09/27/2023-03:32:55] [I] DLACore: [09/27/2023-03:32:55] [I] Plugins: [09/27/2023-03:32:55] [I] setPluginsToSerialize: [09/27/2023-03:32:55] [I] dynamicPlugins: [09/27/2023-03:32:55] [I] ignoreParsedPluginLibs: 0 [09/27/2023-03:32:55] [I] [09/27/2023-03:32:55] [I] === Inference Options === [09/27/2023-03:32:55] [I] Batch: Explicit [09/27/2023-03:32:55] [I] Input inference shape: past_key_values=32x2x1x32x1x128 [09/27/2023-03:32:55] [I] Input inference shape: position_ids=1x1 [09/27/2023-03:32:55] [I] Input inference shape: attention_mask=1x1x1x1 [09/27/2023-03:32:55] [I] Input inference shape: inputids=1x1 [09/27/2023-03:32:55] [I] Iterations: 10 [09/27/2023-03:32:55] [I] Duration: 3s (+ 200ms warm up) [09/27/2023-03:32:55] [I] Sleep time: 0ms [09/27/2023-03:32:55] [I] Idle time: 0ms [09/27/2023-03:32:55] [I] Inference Streams: 1 [09/27/2023-03:32:55] [I] ExposeDMA: Disabled [09/27/2023-03:32:55] [I] Data transfers: Enabled [09/27/2023-03:32:55] [I] Spin-wait: Disabled [09/27/2023-03:32:55] [I] Multithreading: Disabled [09/27/2023-03:32:55] [I] CUDA Graph: Disabled [09/27/2023-03:32:55] [I] Separate profiling: Disabled [09/27/2023-03:32:55] [I] Time Deserialize: Disabled [09/27/2023-03:32:55] [I] Time Refit: Disabled [09/27/2023-03:32:55] [I] NVTX verbosity: 0 [09/27/2023-03:32:55] [I] Persistent Cache Ratio: 0 [09/27/2023-03:32:55] [I] Inputs: [09/27/2023-03:32:55] [I] === Reporting Options === [09/27/2023-03:32:55] [I] Verbose: Disabled [09/27/2023-03:32:55] [I] Averages: 10 inferences [09/27/2023-03:32:55] [I] Percentiles: 90,95,99 [09/27/2023-03:32:55] [I] Dump refittable layers:Disabled [09/27/2023-03:32:55] [I] Dump output: Disabled [09/27/2023-03:32:55] [I] Profile: Disabled [09/27/2023-03:32:55] [I] Export timing to JSON file: [09/27/2023-03:32:55] [I] Export output to JSON file: [09/27/2023-03:32:55] [I] Export profile to JSON file: [09/27/2023-03:32:55] [I] [09/27/2023-03:32:57] [I] === Device Information === [09/27/2023-03:32:57] [I] Selected Device: NVIDIA A100-SXM4-40GB [09/27/2023-03:32:57] [I] Compute Capability: 8.0 [09/27/2023-03:32:57] [I] SMs: 108 [09/27/2023-03:32:57] [I] Device Global Memory: 40339 MiB [09/27/2023-03:32:57] [I] Shared Memory per SM: 164 KiB [09/27/2023-03:32:57] [I] Memory Bus Width: 5120 bits (ECC enabled) [09/27/2023-03:32:57] [I] Application Compute Clock Rate: 1.41 GHz [09/27/2023-03:32:57] [I] Application Memory Clock Rate: 1.215 GHz [09/27/2023-03:32:57] [I] [09/27/2023-03:32:57] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [09/27/2023-03:32:57] [I] [09/27/2023-03:32:57] [I] TensorRT version: 8.6.1 [09/27/2023-03:32:57] [I] Loading standard plugins [09/27/2023-03:32:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 20, GPU 427 (MiB) [09/27/2023-03:33:03] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1657, GPU +310, now: CPU 1753, GPU 737 (MiB) [09/27/2023-03:33:03] [I] Start parsing network model. [09/27/2023-03:33:03] [I] [TRT] ---------------------------------------------------------------- [09/27/2023-03:33:03] [I] [TRT] Input filename: ./llm.onnx [09/27/2023-03:33:03] [I] [TRT] ONNX IR version: 0.0.8 [09/27/2023-03:33:03] [I] [TRT] Opset version: 15 [09/27/2023-03:33:03] [I] [TRT] Producer name: pytorch [09/27/2023-03:33:03] [I] [TRT] Producer version: 2.0.1 [09/27/2023-03:33:03] [I] [TRT] Domain: [09/27/2023-03:33:03] [I] [TRT] Model version: 0 [09/27/2023-03:33:03] [I] [TRT] Doc string: [09/27/2023-03:33:03] [I] [TRT] ---------------------------------------------------------------- t[09/27/2023-03:33:19] [E] Error[4]: /blocks.0/self_attn/IfOutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [1,-1,128] and [1,1,-1,128]. [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:771: While parsing node number 87 [If -> "/blocks.0/self_attn/If_output0"]: [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:772: --- Begin node --- [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:773: input: "/blocks.0/self_attn/Equal_output0" output: "/blocks.0/self_attn/If_output0" name: "/blocks.0/self_attn/If" op_type: "If" attribute { name: "thenbranch" g { node { output: "/blocks.0/self_attn/Constant_12_output0" name: "/blocks.0/self_attn/Constant_12" op_type: "Constant" attribute { name: "value" t { dims: 1 datatype: 7 name: "/blocks.0/self_attn/Constant_12_attr::value" rawdata: "\001\000\000\000\000\000\000\000" } type: TENSOR } } node { input: "/blocks.0/self_attn/rotary_emb/Cast_output0" input: "/blocks.0/self_attn/Constant_12_output0" output: "/blocks.0/self_attn/Squeeze_output0" name: "/blocks.0/self_attn/Squeeze" op_type: "Squeeze" } name: "torchjit1" output { name: "/blocks.0/self_attn/Squeeze_output_0" type { tensor_type { elem_type: 1 shape { dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_0" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_1" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_2" } } } } } } type: GRAPH } attribute { name: "elsebranch" g { node { input: "/blocks.0/self_attn/rotary_emb/Cast_output0" output: "/blocks.0/self_attn/Identity_output0" name: "/blocks.0/self_attn/Identity" op_type: "Identity" } name: "torchjit2" output { name: "/blocks.0/self_attn/Identity_output_0" type { tensor_type { elem_type: 1 shape { dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_0" } dim { dimparam: "Identity/blocks.0/self_attn/Identity_output_0_dim_1" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_1" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_2" } } } } } } type: GRAPH }
[09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:774: --- End node --- [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph: [6] Invalid Node - /blocks_.0/selfattn/If /blocks.0/self_attn/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [1,-1,128] and [1,1,-1,128]. [09/27/2023-03:33:19] [E] Failed to parse onnx file [09/27/2023-03:33:19] [I] Finished parsing network model. Parse time: 15.8847 [09/27/2023-03:33:19] [E] Parsing model failed [09/27/2023-03:33:19] [E] Failed to create engine from model or file. [09/27/2023-03:33:19] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=./llm.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --maxShapes=input_ids:128x1,attention_mask:1x1x128x128,position_ids:128x1,past_key_values:32x2x1x32x128x128