wangzhaode / llm-export

llm-export can export llm model to onnx.
Apache License 2.0
190 stars 21 forks source link

Thank you very much for your contradictory contributions, which have been very helpful to me. I have a question that the onnx to trt format appears with ''/blocks _ 0/self_ ATTN/If_ OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapes are [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? #1

Closed xiaobai52HZ closed 11 months ago

xiaobai52HZ commented 11 months ago

root@1d434e9d1113:/models/onnx# trtexec --onnx=./llm.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --maxShapes=input_ids:128x1,attention_mask:1x1x128x128,position_ids:128x1,past_key_values:32x2x1x32x128x128 &&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=./llm.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --maxShapes=input_ids:128x1,attention_mask:1x1x128x128,position_ids:128x1,past_key_values:32x2x1x32x128x128 [09/27/2023-03:32:55] [I] === Model Options === [09/27/2023-03:32:55] [I] Format: ONNX [09/27/2023-03:32:55] [I] Model: ./llm.onnx [09/27/2023-03:32:55] [I] Output: [09/27/2023-03:32:55] [I] === Build Options === [09/27/2023-03:32:55] [I] Max batch: explicit batch [09/27/2023-03:32:55] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/27/2023-03:32:55] [I] minTiming: 1 [09/27/2023-03:32:55] [I] avgTiming: 8 [09/27/2023-03:32:55] [I] Precision: FP32 [09/27/2023-03:32:55] [I] LayerPrecisions: [09/27/2023-03:32:55] [I] Layer Device Types: [09/27/2023-03:32:55] [I] Calibration: [09/27/2023-03:32:55] [I] Refit: Disabled [09/27/2023-03:32:55] [I] Version Compatible: Disabled [09/27/2023-03:32:55] [I] TensorRT runtime: full [09/27/2023-03:32:55] [I] Lean DLL Path: [09/27/2023-03:32:55] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [09/27/2023-03:32:55] [I] Exclude Lean Runtime: Disabled [09/27/2023-03:32:55] [I] Sparsity: Disabled [09/27/2023-03:32:55] [I] Safe mode: Disabled [09/27/2023-03:32:55] [I] Build DLA standalone loadable: Disabled [09/27/2023-03:32:55] [I] Allow GPU fallback for DLA: Disabled [09/27/2023-03:32:55] [I] DirectIO mode: Disabled [09/27/2023-03:32:55] [I] Restricted mode: Disabled [09/27/2023-03:32:55] [I] Skip inference: Disabled [09/27/2023-03:32:55] [I] Save engine: ./trt/model.plan [09/27/2023-03:32:55] [I] Load engine: [09/27/2023-03:32:55] [I] Profiling verbosity: 0 [09/27/2023-03:32:55] [I] Tactic sources: Using default tactic sources [09/27/2023-03:32:55] [I] timingCacheMode: local [09/27/2023-03:32:55] [I] timingCacheFile: [09/27/2023-03:32:55] [I] Heuristic: Disabled [09/27/2023-03:32:55] [I] Preview Features: Use default preview flags. [09/27/2023-03:32:55] [I] MaxAuxStreams: -1 [09/27/2023-03:32:55] [I] BuilderOptimizationLevel: -1 [09/27/2023-03:32:55] [I] Input(s)s format: fp32:CHW [09/27/2023-03:32:55] [I] Output(s)s format: fp32:CHW [09/27/2023-03:32:55] [I] Input build shape: input_ids=1x1+1x1+128x1 [09/27/2023-03:32:55] [I] Input build shape: attention_mask=1x1x1x1+1x1x1x1+1x1x128x128 [09/27/2023-03:32:55] [I] Input build shape: position_ids=1x1+1x1+128x1 [09/27/2023-03:32:55] [I] Input build shape: past_key_values=32x2x1x32x1x128+32x2x1x32x1x128+32x2x1x32x128x128 [09/27/2023-03:32:55] [I] Input calibration shapes: model [09/27/2023-03:32:55] [I] === System Options === [09/27/2023-03:32:55] [I] Device: 0 [09/27/2023-03:32:55] [I] DLACore: [09/27/2023-03:32:55] [I] Plugins: [09/27/2023-03:32:55] [I] setPluginsToSerialize: [09/27/2023-03:32:55] [I] dynamicPlugins: [09/27/2023-03:32:55] [I] ignoreParsedPluginLibs: 0 [09/27/2023-03:32:55] [I] [09/27/2023-03:32:55] [I] === Inference Options === [09/27/2023-03:32:55] [I] Batch: Explicit [09/27/2023-03:32:55] [I] Input inference shape: past_key_values=32x2x1x32x1x128 [09/27/2023-03:32:55] [I] Input inference shape: position_ids=1x1 [09/27/2023-03:32:55] [I] Input inference shape: attention_mask=1x1x1x1 [09/27/2023-03:32:55] [I] Input inference shape: inputids=1x1 [09/27/2023-03:32:55] [I] Iterations: 10 [09/27/2023-03:32:55] [I] Duration: 3s (+ 200ms warm up) [09/27/2023-03:32:55] [I] Sleep time: 0ms [09/27/2023-03:32:55] [I] Idle time: 0ms [09/27/2023-03:32:55] [I] Inference Streams: 1 [09/27/2023-03:32:55] [I] ExposeDMA: Disabled [09/27/2023-03:32:55] [I] Data transfers: Enabled [09/27/2023-03:32:55] [I] Spin-wait: Disabled [09/27/2023-03:32:55] [I] Multithreading: Disabled [09/27/2023-03:32:55] [I] CUDA Graph: Disabled [09/27/2023-03:32:55] [I] Separate profiling: Disabled [09/27/2023-03:32:55] [I] Time Deserialize: Disabled [09/27/2023-03:32:55] [I] Time Refit: Disabled [09/27/2023-03:32:55] [I] NVTX verbosity: 0 [09/27/2023-03:32:55] [I] Persistent Cache Ratio: 0 [09/27/2023-03:32:55] [I] Inputs: [09/27/2023-03:32:55] [I] === Reporting Options === [09/27/2023-03:32:55] [I] Verbose: Disabled [09/27/2023-03:32:55] [I] Averages: 10 inferences [09/27/2023-03:32:55] [I] Percentiles: 90,95,99 [09/27/2023-03:32:55] [I] Dump refittable layers:Disabled [09/27/2023-03:32:55] [I] Dump output: Disabled [09/27/2023-03:32:55] [I] Profile: Disabled [09/27/2023-03:32:55] [I] Export timing to JSON file: [09/27/2023-03:32:55] [I] Export output to JSON file: [09/27/2023-03:32:55] [I] Export profile to JSON file: [09/27/2023-03:32:55] [I] [09/27/2023-03:32:57] [I] === Device Information === [09/27/2023-03:32:57] [I] Selected Device: NVIDIA A100-SXM4-40GB [09/27/2023-03:32:57] [I] Compute Capability: 8.0 [09/27/2023-03:32:57] [I] SMs: 108 [09/27/2023-03:32:57] [I] Device Global Memory: 40339 MiB [09/27/2023-03:32:57] [I] Shared Memory per SM: 164 KiB [09/27/2023-03:32:57] [I] Memory Bus Width: 5120 bits (ECC enabled) [09/27/2023-03:32:57] [I] Application Compute Clock Rate: 1.41 GHz [09/27/2023-03:32:57] [I] Application Memory Clock Rate: 1.215 GHz [09/27/2023-03:32:57] [I] [09/27/2023-03:32:57] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [09/27/2023-03:32:57] [I] [09/27/2023-03:32:57] [I] TensorRT version: 8.6.1 [09/27/2023-03:32:57] [I] Loading standard plugins [09/27/2023-03:32:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 20, GPU 427 (MiB) [09/27/2023-03:33:03] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1657, GPU +310, now: CPU 1753, GPU 737 (MiB) [09/27/2023-03:33:03] [I] Start parsing network model. [09/27/2023-03:33:03] [I] [TRT] ---------------------------------------------------------------- [09/27/2023-03:33:03] [I] [TRT] Input filename: ./llm.onnx [09/27/2023-03:33:03] [I] [TRT] ONNX IR version: 0.0.8 [09/27/2023-03:33:03] [I] [TRT] Opset version: 15 [09/27/2023-03:33:03] [I] [TRT] Producer name: pytorch [09/27/2023-03:33:03] [I] [TRT] Producer version: 2.0.1 [09/27/2023-03:33:03] [I] [TRT] Domain: [09/27/2023-03:33:03] [I] [TRT] Model version: 0 [09/27/2023-03:33:03] [I] [TRT] Doc string: [09/27/2023-03:33:03] [I] [TRT] ---------------------------------------------------------------- t[09/27/2023-03:33:19] [E] Error[4]: /blocks.0/self_attn/IfOutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [1,-1,128] and [1,1,-1,128]. [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:771: While parsing node number 87 [If -> "/blocks.0/self_attn/If_output0"]: [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:772: --- Begin node --- [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:773: input: "/blocks.0/self_attn/Equal_output0" output: "/blocks.0/self_attn/If_output0" name: "/blocks.0/self_attn/If" op_type: "If" attribute { name: "thenbranch" g { node { output: "/blocks.0/self_attn/Constant_12_output0" name: "/blocks.0/self_attn/Constant_12" op_type: "Constant" attribute { name: "value" t { dims: 1 datatype: 7 name: "/blocks.0/self_attn/Constant_12_attr::value" rawdata: "\001\000\000\000\000\000\000\000" } type: TENSOR } } node { input: "/blocks.0/self_attn/rotary_emb/Cast_output0" input: "/blocks.0/self_attn/Constant_12_output0" output: "/blocks.0/self_attn/Squeeze_output0" name: "/blocks.0/self_attn/Squeeze" op_type: "Squeeze" } name: "torchjit1" output { name: "/blocks.0/self_attn/Squeeze_output_0" type { tensor_type { elem_type: 1 shape { dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_0" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_1" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_2" } } } } } } type: GRAPH } attribute { name: "elsebranch" g { node { input: "/blocks.0/self_attn/rotary_emb/Cast_output0" output: "/blocks.0/self_attn/Identity_output0" name: "/blocks.0/self_attn/Identity" op_type: "Identity" } name: "torchjit2" output { name: "/blocks.0/self_attn/Identity_output_0" type { tensor_type { elem_type: 1 shape { dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_0" } dim { dimparam: "Identity/blocks.0/self_attn/Identity_output_0_dim_1" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_1" } dim { dimparam: "Squeeze/blocks.0/self_attn/Squeeze_output_0_dim_2" } } } } } } type: GRAPH }

[09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:774: --- End node --- [09/27/2023-03:33:19] [E] [TRT] ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph: [6] Invalid Node - /blocks_.0/selfattn/If /blocks.0/self_attn/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [1,-1,128] and [1,1,-1,128]. [09/27/2023-03:33:19] [E] Failed to parse onnx file [09/27/2023-03:33:19] [I] Finished parsing network model. Parse time: 15.8847 [09/27/2023-03:33:19] [E] Parsing model failed [09/27/2023-03:33:19] [E] Failed to create engine from model or file. [09/27/2023-03:33:19] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=./llm.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1x1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x1x128 --maxShapes=input_ids:128x1,attention_mask:1x1x128x128,position_ids:128x1,past_key_values:32x2x1x32x128x128

wangzhaode commented 11 months ago

请说明一下使用的是哪个模型,导出方式是什么?

wangzhaode commented 11 months ago

看起来你的错误是一个If节点导致,此项目中导出的onnx都已经将unsqueeze优化了,应该不会有 If 节点了

xiaobai52HZ commented 11 months ago

llama2模型,直接用的trtexec工具转的,怎么会没有if节点呢?我这边输出还能看到if节点

---Original--- From: @.> Date: Sat, Oct 7, 2023 17:36 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

请说明一下使用的是哪个模型,导出方式是什么?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xiaobai52HZ commented 11 months ago

而且我还尝试过用transformer的optimum做hf到onnx的转换,然后从onnx转换成trt也是这个错误

---Original--- From: @.> Date: Sat, Oct 7, 2023 17:40 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

看起来你的错误是一个If节点导致,此项目中导出的onnx都已经将unsqueeze优化了,应该不会有 If 节点了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangzhaode commented 11 months ago

llama2 的源码没有修改,我优化一下

xiaobai52HZ commented 11 months ago

哇,感谢😊👍👍👍👍太强了😳😳😳期待😳😳请问是要修改llama源码还是onnx转化代码😳?

---Original--- From: @.> Date: Sat, Oct 7, 2023 17:58 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

llama2 的源码没有修改,我优化一下

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangzhaode commented 11 months ago

llama2 的源码没有修改,我优化一下

修改llama源码,这个if来源是squeeze算子引入的

wangzhaode commented 11 months ago

已修复

https://github.com/wangzhaode/llm-export/commit/6be75c3a99b12416bac35775a4c9e0aaeab438d8

把代码更新到最新再试一下

xiaobai52HZ commented 11 months ago

哦哦,了解了,大概什么时候更新好呀😳迫不及待想试一下了😳是不是您项目里其他模型也修改了源码呀?我朋友用的您这个chatglm转换就没这个问题

---Original--- From: @.> Date: Sat, Oct 7, 2023 18:10 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

llama2 的源码没有修改,我优化一下

修改llama源码,这个if来源是squeeze算子引入的

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangzhaode commented 11 months ago

已经修改了 是的 别的都有修改 之前没有修改llama的

------------------ Original ------------------ From: xiaobai52HZ @.> Date: Sat,Oct 7,2023 6:16 PM To: wangzhaode/llm-export @.> Cc: 王召德 @.>, Assign @.> Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

哦哦,了解了,大概什么时候更新好呀😳迫不及待想试一下了😳是不是您项目里其他模型也修改了源码呀?我朋友用的您这个chatglm转换就没这个问题

---Original--- From: @.> Date: Sat, Oct 7, 2023 18:10 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

llama2 的源码没有修改,我优化一下

修改llama源码,这个if来源是squeeze算子引入的

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.>

xiaobai52HZ commented 11 months ago

嗯嗯,太强了,感谢,我试试

---Original--- From: @.> Date: Sat, Oct 7, 2023 18:18 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

已经修改了 是的 别的都有修改 之前没有修改llama的

------------------ Original ------------------ From: xiaobai52HZ @.> Date: Sat,Oct 7,2023 6:16 PM To: wangzhaode/llm-export @.> Cc: 王召德 @.>, Assign @.> Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

哦哦,了解了,大概什么时候更新好呀😳迫不及待想试一下了😳是不是您项目里其他模型也修改了源码呀?我朋友用的您这个chatglm转换就没这个问题

---Original---
From: @.>
Date: Sat, Oct 7, 2023 18:10 PM
To:
@.>;
Cc: @.**@.>;
Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

llama2 的源码没有修改,我优化一下

修改llama源码,这个if来源是squeeze算子引入的


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: @.>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID:
@.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangzhaode commented 11 months ago

好的 问题解决的话可以回复一下或者给个✨

xiaobai52HZ commented 11 months ago

嗯,今天环境有点问题,明天试一下,已经点啦✨

xiaobai52HZ commented 11 months ago

image [10/09/2023-08:13:15] [I] [TRT] ---------------------------------------------------------------- [10/09/2023-08:14:00] [W] [TRT] onnx2trtutils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:771: While parsing node number 82 [Squeeze -> "/blocks.0/self_attn/Squeeze_output0"]: [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:772: --- Begin node --- [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:773: input: "/blocks.0/self_attn/rotary_emb/Cast_output0" output: "/blocks.0/self_attn/Squeeze_output0" name: "/blocks.0/self_attn/Squeeze" op_type: "Squeeze"

[10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:774: --- End node --- [10/09/2023-08:14:00] [E] [TRT] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:4773 In function importSqueeze: [8] Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set." [10/09/2023-08:14:00] [E] Failed to parse onnx file [10/09/2023-08:14:01] [I] Finished parsing network model. Parse time: 45.9346 [10/09/2023-08:14:01] [E] Parsing model failed [10/09/2023-08:14:01] [E] Failed to create engine from model or file. [10/09/2023-08:14:01] [E] Engine set up failed

trtexec转换的时候出现了上述问题 我查看四个输入的维度信息是这样的: input_ids: seq_len, attention_mask: 1, 1, seq_len, seq_len, position_ids: seq_len, 3, past_key_values: 32, 2, 1, 32, history_len, 128, token_id: 1, presents: 32, 2, 1, 32, Concatpresents_dim_4, 128,

在config.pbtxt文件中是这样显示的, "input": [{ "name": "past_key_values", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [32, 2, 1, 32, -1, 128], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "position_ids", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [-1, 3], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "attention_mask", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [1, 1, -1, -1], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "input_ids", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [-1], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }], position_ids还是有问题吧?

我看了一下chatglm的position_ids维度信息是"dims": [-1],

xiaobai52HZ commented 11 months ago

企业微信截图_16967321014292

改了position_ids的动态维度也还是这个问题 input_ids: seq_len, attention_mask: 1, 1, seq_len, target_seq_len, position_ids: 1, seq_len, past_key_values: 32, 2, 1, 32, history_len, 128, token_id: 1, presents: 32, 2, 1, 32, Concatpresents_dim_4, 128,


"name": "position_ids", "data_type": "TYPE_INT64", "format": "FORMAT_NONE", "dims": [1, -1],

[10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:771: While parsing node number 82 [Squeeze -> "/blocks_.0/self_attn/Squeeze_output0"]: [10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:772: --- Begin node --- [10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:773: input: "/blocks.0/self_attn/rotary_emb/Cast_output0" output: "/blocks.0/self_attn/Squeeze_output0" name: "/blocks.0/self_attn/Squeeze" op_type: "Squeeze"

[10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:774: --- End node --- [10/09/2023-08:57:35] [E] [TRT] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:4773 In function importSqueeze: [8] Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set." [10/09/2023-08:57:35] [E] Failed to parse onnx file [10/09/2023-08:57:36] [I] Finished parsing network model. Parse time: 45.6445 [10/09/2023-08:57:36] [E] Parsing model failed [10/09/2023-08:57:36] [E] Failed to create engine from model or file. [10/09/2023-08:57:36] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=./model.onnx --saveEngine=./trt/model.plan --optShapes=input_ids:1,attention_mask:1x1x1x2,position_ids:1x1,past_key_values:32x2x1x32x1x128 --minShapes=input_ids:1,attention_mask:1x1x1x1,position_ids:1x1,past_key_values:32x2x1x32x0x128 --maxShapes=input_ids:128,attention_mask:1x1x128x256,position_ids:1x128,past_key_values:32x2x1x32x128x128 --device=1 --fp16

wangzhaode commented 11 months ago

Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set."

这个不是导出模型的问题了,TRT不支持squeeze动态形状

wangzhaode commented 11 months ago

position_ids的shape是没问题的,不需要修改的

xiaobai52HZ commented 11 months ago

image 那我这样改了以后,onnx也能推理,这样改了有问题吗?

xiaobai52HZ commented 11 months ago

这个squeeze算子不支持,我是不是需要改源码,把所有的squeeze换成reshape,还是 只需要吧position_ids的squeeze换成reshape呀

xiaobai52HZ commented 11 months ago

position_ids维度是[-1,3],这很奇怪吧?3是seq_len的长度

---Original--- From: @.> Date: Mon, Oct 9, 2023 17:20 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

position_ids的shape是没问题的,不需要修改的

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xiaobai52HZ commented 11 months ago

大佬,这个能怎么改可以转trt啊😭

---Original--- From: @.> Date: Mon, Oct 9, 2023 17:19 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set."

这个不是导出模型的问题了,TRT不支持squeeze动态形状

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangzhaode commented 11 months ago

我没有测试过trt 这个你来尝试修改一下吧, 这2行 尝试替换成reshape或者view再试试

https://github.com/wangzhaode/llm-export/blob/55154487f32f423546057b5c60f1dbed63077eee/llm_models/Llama-2-7b-chat-ms/modeling_llama.py#L183C5-L184C42

xiaobai52HZ commented 11 months ago

好,我试试

---Original--- From: @.> Date: Mon, Oct 9, 2023 19:19 PM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

我没有测试过trt 这个你来尝试修改一下吧, 这2行 尝试替换成reshape或者view再试试

https://github.com/wangzhaode/llm-export/blob/55154487f32f423546057b5c60f1dbed63077eee/llm_models/Llama-2-7b-chat-ms/modeling_llama.py#L184C5-L184C5

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xiaobai52HZ commented 11 months ago

大佬,您有时间在看一下这大概是什么问题,改了https://github.com/wangzhaode/llm-export/blob/55154487f32f423546057b5c60f1dbed63077eee/llm_models/Llama-2-7b-chat-ms/modeling_llama.py#L183C5-L184C42后,没有那个错误信息了,出来一个新的错误信息, image您看有什么思路改改吗?

xiaobai52HZ commented 11 months ago

大佬,改完之后可以了,想请问这个转换适合llama非chat模型和llama变体模型吗?

wangzhaode commented 11 months ago

大佬,改完之后可以了,想请问这个转换适合llama非chat模型和llama变体模型吗?

适合的 llm基础的模型结构都适用的,另外你的改动是否可以pr一下?

xiaobai52HZ commented 11 months ago

我提交了,您看看有问题没

xiaobai52HZ commented 11 months ago

还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢

wangzhaode commented 11 months ago

还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢

可以在代码里修改,导出时打开verbose 看一下int64是哪里出现的,然后把dtype都改成int32就行了

xiaobai52HZ commented 11 months ago

好滴😊

---Original--- From: @.> Date: Thu, Oct 12, 2023 11:33 AM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢

可以在代码里修改,导出时打开verbose 看一下int64是哪里出现的,然后把dtype都改成int32就行了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xiaobai52HZ commented 11 months ago

大佬,您做过peft模型(ptuning)和预训练模型(llama)的合并工作吗?就是合并成一个模型,有啥思路可以提供的嘛 😭 我自己用peft的ptuning微调了llama,但是合并不了,因为llama没有promptembedding层

---Original--- From: @.> Date: Thu, Oct 12, 2023 11:33 AM To: @.>; Cc: @.**@.>; Subject: Re: [wangzhaode/llm-export] Thank you very much for yourcontradictory contributions, which have been very helpful to me. I have aquestion that the onnx to trt format appears with ''/blocks 0/self ATTN/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape Shapesare [1, one 1128 and [1], 1, one 1128] Error, how can we solve it? (Issue #1)

还有一个问题,因为tensorrt不支持int64,如何转onnx时候设置int32呢

可以在代码里修改,导出时打开verbose 看一下int64是哪里出现的,然后把dtype都改成int32就行了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>