关于mmdetection微调之后 Grounding DINO 无法tensort加速问题

xiyangyang99 commented 11 months ago

利用mmdetection微调自己的本地数据集之后，GroundingDINO在本地推理的时间为200-300毫秒，转为onnx推理时间为在cpu上5s一张，在GPU上推理onnx大约2s一张图像，当把onnx模型转为tensorrt模型，官方的trtexec 工具无法转换为--fp16 的trt模型。请问 mmdetection有准备将groundingDINO加速的打算吗？

hhaAndroid commented 11 months ago

这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr

xiyangyang99 commented 11 months ago

这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr

我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。

hhaAndroid commented 11 months ago

如果你们有兴趣来支持，那非常好呀，你们来主导支持，然后碰到问题和 mmdeploy 人员进行沟通。

wxz1996 commented 11 months ago

这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr

我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。

请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错

xiyangyang99 commented 11 months ago

我是用trtexec转的。不是脚本转的。

---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr

我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。

请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wxz1996 commented 11 months ago

我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

你的pytorch转onnx是将整个grounding dino全部转了吗

xiyangyang99 commented 11 months ago

是的

---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

你的pytorch转onnx是将整个grounding dino全部转了吗

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wxz1996 commented 11 months ago

是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.>

我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swint_ogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗

xiyangyang99 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7

---原始邮件--- 发件人: @.> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.>

我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swint_ogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wxz1996 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swint_ogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

我也是8.6.1.6，cuda是12.3，你转trt的命令是？

xiyangyang99 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7 … ---原始邮件--- 发件人: @.**> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.**>; 抄送: @.**@.**>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swintogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

我也是8.6.1.6，cuda是12.3，你转trt的命令是？就是你的那个命令转的，trtexec --onnx=your onnx --saveEngine=you.engine --fp16

xiyangyang99 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7 … ---原始邮件--- 发件人: @.**> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.**>; 抄送: @.**@.**>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swintogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

我也是8.6.1.6，cuda是12.3，你转trt的命令是？

engine我倒是转出来了，但是用python写推理脚本的时候，engine的反序列化出bug了。还在看怎么推理engine文件。

xiyangyang99 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7 … ---原始邮件--- 发件人: @.**> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.**>; 抄送: @.**@.**>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swintogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

我也是8.6.1.6，cuda是12.3，你转trt的命令是？这是我转出来的结果。

wxz1996 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7 … ---原始邮件--- 发件人: @.**> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.**>; 抄送: @.**@.**>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swintogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

我也是8.6.1.6，cuda是12.3，你转trt的命令是？这是我转出来的结果。

我直接转出来跟onnx的输出无法对齐，你这个可以吗

xiyangyang99 commented 11 months ago

你的环境是什么？我用的tensorrt版本是8.6.1.6。CUDA11 .7 … ---原始邮件--- 发件人: @.**> 发送时间: 2024年1月5日(周五) 中午11:54 收件人: @.**>; 抄送: @.**@.**>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 是的 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上9:45 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我是用trtexec转的。不是脚本转的。 … ---原始邮件--- 发件人: @.> 发送时间: 2024年1月4日(周四) 晚上8:02 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这个需要 mmdeploy 支持，他们目前初步计划是 1月份，但是不是很确定 1 月能不能做完。如果社区有愿意支持那就太好了。可以给 mmdet 提 pr 我现在手里有GroundingDINO转tensorrt成功的例子，在RTX3090上推理时间40ms，但是目前好像mmdetection的GroundingDINO的onnx无法转为engine。官方的GroundingDINO的模型可以转为engine。可以让我们假如mmdeploy的联合开发者吗？我们团队正在做groundingdino的tensorrt加速适配。请问能借鉴下GroundingDINO onnx转trt的python脚本吗，我这边一直报错 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 你的pytorch转onnx是将整个grounding dino全部转了吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> 我这边转出来还是有些问题，不知道是不是onnx转错了，我是按照https://github.com/wenyi5608/GroundingDINO.git这个库来测试的。onnx到trt用的是： trtexec --onnx=./weights/groundingdino_swint_ogc.onnx \ --saveEngine=./weights/groundingdino_swintogc.trt --best --workspace=1024 大佬能给个转onnx的脚本吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @_.***>

我也是8.6.1.6，cuda是12.3，你转trt的命令是？这是我转出来的结果。

我直接转出来跟onnx的输出无法对齐，你这个可以吗

你有python端的推理脚本嘛？用的动态输入还是静态输入？我这边推理代码还没写完。所以还不得而知。

wxz1996 commented 11 months ago

@QzYER export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

xiyangyang99 commented 11 months ago

我先看看。

---原始邮件--- 发件人: @.> 发送时间: 2024年1月6日(周六) 晚上10:11 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xiyangyang99 commented 11 months ago

@QzYER export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

你测试了fp16没有是否对其？我这边刚刚测试了一下，也无法对齐精度问题。你测试的时候输入的不同文本，其中input_ids 、attension_mask、 position_ids 等这几个输入参数是否对应？

xiyangyang99 commented 11 months ago

@QzYER export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

我刚刚测试了我的trt推理，输出是这个样子的。输出精度也无法对齐，fp16精度的。 c6b32bebdcec98bae0d454c0518c397

wxz1996 commented 11 months ago

@QzYER export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

你测试了fp16没有是否对其？我这边刚刚测试了一下，也无法对齐精度问题。你测试的时候输入的不同文本，其中input_ids 、attension_mask、 position_ids 等这几个输入参数是否对应？

都是对应的，你看我的脚本里面，onnx和trt输入完全一样，fp32/fp16都是对不齐

xiyangyang99 commented 11 months ago

我看了你的推理代码，我的也一样，输出无法对齐。

这是我的输出情况。

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月8日(星期一) 下午2:36 @.>; @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

@QzYER export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

你测试了fp16没有是否对其？我这边刚刚测试了一下，也无法对齐精度问题。你测试的时候输入的不同文本，其中input_ids 、attension_mask、 position_ids 等这几个输入参数是否对应？

都是对应的，你看我的脚本里面，onnx和trt输入完全一样，fp32/fp16都是对不齐

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xiyangyang99 commented 11 months ago

如果pytorchoronnx 无法和tensorrt对齐的话，只有通过pytorch在某一层的输出和onnx相对应的某节点输出的tensor做对比，来先定位是哪个部分精度无法对齐。然后再找算子去替换掉。我已经反馈给tensorrt官方了。看他们怎么看。

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月8日(星期一) 下午2:36 @.>; @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

@QzYER export_trt.zip 这是我的推理脚本，里面有onnx和trt，我是静态输入，动态的搞不定，目前fp32的trt跟onnx对不齐

你测试了fp16没有是否对其？我这边刚刚测试了一下，也无法对齐精度问题。你测试的时候输入的不同文本，其中input_ids 、attension_mask、 position_ids 等这几个输入参数是否对应？

都是对应的，你看我的脚本里面，onnx和trt输入完全一样，fp32/fp16都是对不齐

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Baboom-l commented 10 months ago

使用mmdeploy导出需要固定图像尺寸，text可以动态，这样FP32的Grounding DINO精度可以对齐

xiyangyang99 commented 10 months ago

使用mmdeploy导出需要固定图像尺寸，text可以动态，这样FP32的Grounding DINO精度可以对齐

这是我的转换过程。请问兄台是如何固定输入图像尺寸，text设为动态输入。把groundingdino转出来的呀？

(yolo8) bowen@bowen-MS-7D20:/media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/mmdeploy$ python /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/mmdeploy/tools/deploy.py /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/mmdeploy/mmdetection/configs/grounding_dino/grounding_dino_swin-b_finetune_16xb2_1x_coco.py /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/grounding_dino_deploy/weights.pth /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/mmdeploy/mmdetection/demo/demo.jpg --work-dir mmdeploy_model/groundingdino --device cuda --dump-info 01/09 03:06:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 01/09 03:06:03 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 01/09 03:06:04 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess 01/09 03:06:05 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 01/09 03:06:05 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 2.01kB/s] Downloading vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 509kB/s] Downloading tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 714kB/s] Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 12.8kB/s] Loads checkpoint by local backend from path: /media/bowen/6202c499-4f0a-4280-af7e-d2ab4b6c74dd/home/bowen/grounding_dino_deploy/weights.pth The model and loaded state dict do not match exactly

size mismatch for backbone.patch_embed.projection.weight: copying a param with shape torch.Size([96, 3, 4, 4]) from checkpoint, the shape in current model is torch.Size([128, 3, 4, 4]). size mismatch for backbone.patch_embed.projection.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.patch_embed.norm.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.patch_embed.norm.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.0.norm1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.0.norm1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 3]) from checkpoint, the shape in current model is torch.Size([529, 4]). size mismatch for backbone.stages.0.blocks.0.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.0.blocks.0.attn.w_msa.qkv.weight: copying a param with shape torch.Size([288, 96]) from checkpoint, the shape in current model is torch.Size([384, 128]). size mismatch for backbone.stages.0.blocks.0.attn.w_msa.qkv.bias: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for backbone.stages.0.blocks.0.attn.w_msa.proj.weight: copying a param with shape torch.Size([96, 96]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for backbone.stages.0.blocks.0.attn.w_msa.proj.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.0.norm2.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.0.norm2.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.0.ffn.layers.0.0.weight: copying a param with shape torch.Size([384, 96]) from checkpoint, the shape in current model is torch.Size([512, 128]). size mismatch for backbone.stages.0.blocks.0.ffn.layers.0.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.0.blocks.0.ffn.layers.1.weight: copying a param with shape torch.Size([96, 384]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for backbone.stages.0.blocks.0.ffn.layers.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.1.norm1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.1.norm1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.1.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 3]) from checkpoint, the shape in current model is torch.Size([529, 4]). size mismatch for backbone.stages.0.blocks.1.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.0.blocks.1.attn.w_msa.qkv.weight: copying a param with shape torch.Size([288, 96]) from checkpoint, the shape in current model is torch.Size([384, 128]). size mismatch for backbone.stages.0.blocks.1.attn.w_msa.qkv.bias: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for backbone.stages.0.blocks.1.attn.w_msa.proj.weight: copying a param with shape torch.Size([96, 96]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for backbone.stages.0.blocks.1.attn.w_msa.proj.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.1.norm2.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.1.norm2.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.blocks.1.ffn.layers.0.0.weight: copying a param with shape torch.Size([384, 96]) from checkpoint, the shape in current model is torch.Size([512, 128]). size mismatch for backbone.stages.0.blocks.1.ffn.layers.0.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.0.blocks.1.ffn.layers.1.weight: copying a param with shape torch.Size([96, 384]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for backbone.stages.0.blocks.1.ffn.layers.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for backbone.stages.0.downsample.norm.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.0.downsample.norm.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.0.downsample.reduction.weight: copying a param with shape torch.Size([192, 384]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for backbone.stages.1.blocks.0.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.0.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.0.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 6]) from checkpoint, the shape in current model is torch.Size([529, 8]). size mismatch for backbone.stages.1.blocks.0.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.1.blocks.0.attn.w_msa.qkv.weight: copying a param with shape torch.Size([576, 192]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for backbone.stages.1.blocks.0.attn.w_msa.qkv.bias: copying a param with shape torch.Size([576]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for backbone.stages.1.blocks.0.attn.w_msa.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([256, 256]). size mismatch for backbone.stages.1.blocks.0.attn.w_msa.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.0.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.0.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.0.ffn.layers.0.0.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([1024, 256]). size mismatch for backbone.stages.1.blocks.0.ffn.layers.0.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.1.blocks.0.ffn.layers.1.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([256, 1024]). size mismatch for backbone.stages.1.blocks.0.ffn.layers.1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.1.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.1.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.1.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 6]) from checkpoint, the shape in current model is torch.Size([529, 8]). size mismatch for backbone.stages.1.blocks.1.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.1.blocks.1.attn.w_msa.qkv.weight: copying a param with shape torch.Size([576, 192]) from checkpoint, the shape in current model is torch.Size([768, 256]). size mismatch for backbone.stages.1.blocks.1.attn.w_msa.qkv.bias: copying a param with shape torch.Size([576]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for backbone.stages.1.blocks.1.attn.w_msa.proj.weight: copying a param with shape torch.Size([192, 192]) from checkpoint, the shape in current model is torch.Size([256, 256]). size mismatch for backbone.stages.1.blocks.1.attn.w_msa.proj.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.1.norm2.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.1.norm2.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.blocks.1.ffn.layers.0.0.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([1024, 256]). size mismatch for backbone.stages.1.blocks.1.ffn.layers.0.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.1.blocks.1.ffn.layers.1.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([256, 1024]). size mismatch for backbone.stages.1.blocks.1.ffn.layers.1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.stages.1.downsample.norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.1.downsample.norm.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.1.downsample.reduction.weight: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for backbone.stages.2.blocks.0.norm1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.0.norm1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.0.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 12]) from checkpoint, the shape in current model is torch.Size([529, 16]). size mismatch for backbone.stages.2.blocks.0.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.2.blocks.0.attn.w_msa.qkv.weight: copying a param with shape torch.Size([1152, 384]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for backbone.stages.2.blocks.0.attn.w_msa.qkv.bias: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for backbone.stages.2.blocks.0.attn.w_msa.proj.weight: copying a param with shape torch.Size([384, 384]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for backbone.stages.2.blocks.0.attn.w_msa.proj.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.0.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.0.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.0.ffn.layers.0.0.weight: copying a param with shape torch.Size([1536, 384]) from checkpoint, the shape in current model is torch.Size([2048, 512]). size mismatch for backbone.stages.2.blocks.0.ffn.layers.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.blocks.0.ffn.layers.1.weight: copying a param with shape torch.Size([384, 1536]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for backbone.stages.2.blocks.0.ffn.layers.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.1.norm1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.1.norm1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.1.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 12]) from checkpoint, the shape in current model is torch.Size([529, 16]). size mismatch for backbone.stages.2.blocks.1.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.2.blocks.1.attn.w_msa.qkv.weight: copying a param with shape torch.Size([1152, 384]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for backbone.stages.2.blocks.1.attn.w_msa.qkv.bias: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for backbone.stages.2.blocks.1.attn.w_msa.proj.weight: copying a param with shape torch.Size([384, 384]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for backbone.stages.2.blocks.1.attn.w_msa.proj.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.1.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.1.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.1.ffn.layers.0.0.weight: copying a param with shape torch.Size([1536, 384]) from checkpoint, the shape in current model is torch.Size([2048, 512]). size mismatch for backbone.stages.2.blocks.1.ffn.layers.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.blocks.1.ffn.layers.1.weight: copying a param with shape torch.Size([384, 1536]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for backbone.stages.2.blocks.1.ffn.layers.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.2.norm1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.2.norm1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.2.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 12]) from checkpoint, the shape in current model is torch.Size([529, 16]). size mismatch for backbone.stages.2.blocks.2.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.2.blocks.2.attn.w_msa.qkv.weight: copying a param with shape torch.Size([1152, 384]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for backbone.stages.2.blocks.2.attn.w_msa.qkv.bias: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for backbone.stages.2.blocks.2.attn.w_msa.proj.weight: copying a param with shape torch.Size([384, 384]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for backbone.stages.2.blocks.2.attn.w_msa.proj.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.2.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.2.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.2.ffn.layers.0.0.weight: copying a param with shape torch.Size([1536, 384]) from checkpoint, the shape in current model is torch.Size([2048, 512]). size mismatch for backbone.stages.2.blocks.2.ffn.layers.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.blocks.2.ffn.layers.1.weight: copying a param with shape torch.Size([384, 1536]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for backbone.stages.2.blocks.2.ffn.layers.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.3.norm1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.3.norm1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.3.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 12]) from checkpoint, the shape in current model is torch.Size([529, 16]). size mismatch for backbone.stages.2.blocks.3.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.2.blocks.3.attn.w_msa.qkv.weight: copying a param with shape torch.Size([1152, 384]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for backbone.stages.2.blocks.3.attn.w_msa.qkv.bias: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for backbone.stages.2.blocks.3.attn.w_msa.proj.weight: copying a param with shape torch.Size([384, 384]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for backbone.stages.2.blocks.3.attn.w_msa.proj.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.3.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.3.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.3.ffn.layers.0.0.weight: copying a param with shape torch.Size([1536, 384]) from checkpoint, the shape in current model is torch.Size([2048, 512]). size mismatch for backbone.stages.2.blocks.3.ffn.layers.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.blocks.3.ffn.layers.1.weight: copying a param with shape torch.Size([384, 1536]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for backbone.stages.2.blocks.3.ffn.layers.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.4.norm1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.4.norm1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.4.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 12]) from checkpoint, the shape in current model is torch.Size([529, 16]). size mismatch for backbone.stages.2.blocks.4.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.2.blocks.4.attn.w_msa.qkv.weight: copying a param with shape torch.Size([1152, 384]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for backbone.stages.2.blocks.4.attn.w_msa.qkv.bias: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for backbone.stages.2.blocks.4.attn.w_msa.proj.weight: copying a param with shape torch.Size([384, 384]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for backbone.stages.2.blocks.4.attn.w_msa.proj.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.4.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.4.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.4.ffn.layers.0.0.weight: copying a param with shape torch.Size([1536, 384]) from checkpoint, the shape in current model is torch.Size([2048, 512]). size mismatch for backbone.stages.2.blocks.4.ffn.layers.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.blocks.4.ffn.layers.1.weight: copying a param with shape torch.Size([384, 1536]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for backbone.stages.2.blocks.4.ffn.layers.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.5.norm1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.5.norm1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.5.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 12]) from checkpoint, the shape in current model is torch.Size([529, 16]). size mismatch for backbone.stages.2.blocks.5.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.2.blocks.5.attn.w_msa.qkv.weight: copying a param with shape torch.Size([1152, 384]) from checkpoint, the shape in current model is torch.Size([1536, 512]). size mismatch for backbone.stages.2.blocks.5.attn.w_msa.qkv.bias: copying a param with shape torch.Size([1152]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for backbone.stages.2.blocks.5.attn.w_msa.proj.weight: copying a param with shape torch.Size([384, 384]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for backbone.stages.2.blocks.5.attn.w_msa.proj.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.5.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.5.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.blocks.5.ffn.layers.0.0.weight: copying a param with shape torch.Size([1536, 384]) from checkpoint, the shape in current model is torch.Size([2048, 512]). size mismatch for backbone.stages.2.blocks.5.ffn.layers.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.blocks.5.ffn.layers.1.weight: copying a param with shape torch.Size([384, 1536]) from checkpoint, the shape in current model is torch.Size([512, 2048]). size mismatch for backbone.stages.2.blocks.5.ffn.layers.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.stages.2.downsample.norm.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.downsample.norm.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for backbone.stages.2.downsample.reduction.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]). size mismatch for backbone.stages.3.blocks.0.norm1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.0.norm1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.0.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 24]) from checkpoint, the shape in current model is torch.Size([529, 32]). size mismatch for backbone.stages.3.blocks.0.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.3.blocks.0.attn.w_msa.qkv.weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([3072, 1024]). size mismatch for backbone.stages.3.blocks.0.attn.w_msa.qkv.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for backbone.stages.3.blocks.0.attn.w_msa.proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for backbone.stages.3.blocks.0.attn.w_msa.proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.0.norm2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.0.norm2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.0.ffn.layers.0.0.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for backbone.stages.3.blocks.0.ffn.layers.0.0.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for backbone.stages.3.blocks.0.ffn.layers.1.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for backbone.stages.3.blocks.0.ffn.layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.1.norm1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.1.norm1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.1.attn.w_msa.relative_position_bias_table: copying a param with shape torch.Size([169, 24]) from checkpoint, the shape in current model is torch.Size([529, 32]). size mismatch for backbone.stages.3.blocks.1.attn.w_msa.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]). size mismatch for backbone.stages.3.blocks.1.attn.w_msa.qkv.weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([3072, 1024]). size mismatch for backbone.stages.3.blocks.1.attn.w_msa.qkv.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for backbone.stages.3.blocks.1.attn.w_msa.proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for backbone.stages.3.blocks.1.attn.w_msa.proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.1.norm2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.1.norm2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.stages.3.blocks.1.ffn.layers.0.0.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for backbone.stages.3.blocks.1.ffn.layers.0.0.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for backbone.stages.3.blocks.1.ffn.layers.1.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for backbone.stages.3.blocks.1.ffn.layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.norm1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.norm1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for backbone.norm2.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.norm2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for backbone.norm3.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for backbone.norm3.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for neck.convs.0.conv.weight: copying a param with shape torch.Size([256, 192, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]). size mismatch for neck.convs.1.conv.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for neck.convs.2.conv.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]). size mismatch for neck.extra_convs.0.conv.weight: copying a param with shape torch.Size([256, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 3, 3]). missing keys in source state_dict: backbone.stages.2.blocks.6.norm1.weight, backbone.stages.2.blocks.6.norm1.bias, backbone.stages.2.blocks.6.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.6.attn.w_msa.relative_position_index, backbone.stages.2.blocks.6.attn.w_msa.qkv.weight, backbone.stages.2.blocks.6.attn.w_msa.qkv.bias, backbone.stages.2.blocks.6.attn.w_msa.proj.weight, backbone.stages.2.blocks.6.attn.w_msa.proj.bias, backbone.stages.2.blocks.6.norm2.weight, backbone.stages.2.blocks.6.norm2.bias, backbone.stages.2.blocks.6.ffn.layers.0.0.weight, backbone.stages.2.blocks.6.ffn.layers.0.0.bias, backbone.stages.2.blocks.6.ffn.layers.1.weight, backbone.stages.2.blocks.6.ffn.layers.1.bias, backbone.stages.2.blocks.7.norm1.weight, backbone.stages.2.blocks.7.norm1.bias, backbone.stages.2.blocks.7.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.7.attn.w_msa.relative_position_index, backbone.stages.2.blocks.7.attn.w_msa.qkv.weight, backbone.stages.2.blocks.7.attn.w_msa.qkv.bias, backbone.stages.2.blocks.7.attn.w_msa.proj.weight, backbone.stages.2.blocks.7.attn.w_msa.proj.bias, backbone.stages.2.blocks.7.norm2.weight, backbone.stages.2.blocks.7.norm2.bias, backbone.stages.2.blocks.7.ffn.layers.0.0.weight, backbone.stages.2.blocks.7.ffn.layers.0.0.bias, backbone.stages.2.blocks.7.ffn.layers.1.weight, backbone.stages.2.blocks.7.ffn.layers.1.bias, backbone.stages.2.blocks.8.norm1.weight, backbone.stages.2.blocks.8.norm1.bias, backbone.stages.2.blocks.8.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.8.attn.w_msa.relative_position_index, backbone.stages.2.blocks.8.attn.w_msa.qkv.weight, backbone.stages.2.blocks.8.attn.w_msa.qkv.bias, backbone.stages.2.blocks.8.attn.w_msa.proj.weight, backbone.stages.2.blocks.8.attn.w_msa.proj.bias, backbone.stages.2.blocks.8.norm2.weight, backbone.stages.2.blocks.8.norm2.bias, backbone.stages.2.blocks.8.ffn.layers.0.0.weight, backbone.stages.2.blocks.8.ffn.layers.0.0.bias, backbone.stages.2.blocks.8.ffn.layers.1.weight, backbone.stages.2.blocks.8.ffn.layers.1.bias, backbone.stages.2.blocks.9.norm1.weight, backbone.stages.2.blocks.9.norm1.bias, backbone.stages.2.blocks.9.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.9.attn.w_msa.relative_position_index, backbone.stages.2.blocks.9.attn.w_msa.qkv.weight, backbone.stages.2.blocks.9.attn.w_msa.qkv.bias, backbone.stages.2.blocks.9.attn.w_msa.proj.weight, backbone.stages.2.blocks.9.attn.w_msa.proj.bias, backbone.stages.2.blocks.9.norm2.weight, backbone.stages.2.blocks.9.norm2.bias, backbone.stages.2.blocks.9.ffn.layers.0.0.weight, backbone.stages.2.blocks.9.ffn.layers.0.0.bias, backbone.stages.2.blocks.9.ffn.layers.1.weight, backbone.stages.2.blocks.9.ffn.layers.1.bias, backbone.stages.2.blocks.10.norm1.weight, backbone.stages.2.blocks.10.norm1.bias, backbone.stages.2.blocks.10.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.10.attn.w_msa.relative_position_index, backbone.stages.2.blocks.10.attn.w_msa.qkv.weight, backbone.stages.2.blocks.10.attn.w_msa.qkv.bias, backbone.stages.2.blocks.10.attn.w_msa.proj.weight, backbone.stages.2.blocks.10.attn.w_msa.proj.bias, backbone.stages.2.blocks.10.norm2.weight, backbone.stages.2.blocks.10.norm2.bias, backbone.stages.2.blocks.10.ffn.layers.0.0.weight, backbone.stages.2.blocks.10.ffn.layers.0.0.bias, backbone.stages.2.blocks.10.ffn.layers.1.weight, backbone.stages.2.blocks.10.ffn.layers.1.bias, backbone.stages.2.blocks.11.norm1.weight, backbone.stages.2.blocks.11.norm1.bias, backbone.stages.2.blocks.11.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.11.attn.w_msa.relative_position_index, backbone.stages.2.blocks.11.attn.w_msa.qkv.weight, backbone.stages.2.blocks.11.attn.w_msa.qkv.bias, backbone.stages.2.blocks.11.attn.w_msa.proj.weight, backbone.stages.2.blocks.11.attn.w_msa.proj.bias, backbone.stages.2.blocks.11.norm2.weight, backbone.stages.2.blocks.11.norm2.bias, backbone.stages.2.blocks.11.ffn.layers.0.0.weight, backbone.stages.2.blocks.11.ffn.layers.0.0.bias, backbone.stages.2.blocks.11.ffn.layers.1.weight, backbone.stages.2.blocks.11.ffn.layers.1.bias, backbone.stages.2.blocks.12.norm1.weight, backbone.stages.2.blocks.12.norm1.bias, backbone.stages.2.blocks.12.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.12.attn.w_msa.relative_position_index, backbone.stages.2.blocks.12.attn.w_msa.qkv.weight, backbone.stages.2.blocks.12.attn.w_msa.qkv.bias, backbone.stages.2.blocks.12.attn.w_msa.proj.weight, backbone.stages.2.blocks.12.attn.w_msa.proj.bias, backbone.stages.2.blocks.12.norm2.weight, backbone.stages.2.blocks.12.norm2.bias, backbone.stages.2.blocks.12.ffn.layers.0.0.weight, backbone.stages.2.blocks.12.ffn.layers.0.0.bias, backbone.stages.2.blocks.12.ffn.layers.1.weight, backbone.stages.2.blocks.12.ffn.layers.1.bias, backbone.stages.2.blocks.13.norm1.weight, backbone.stages.2.blocks.13.norm1.bias, backbone.stages.2.blocks.13.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.13.attn.w_msa.relative_position_index, backbone.stages.2.blocks.13.attn.w_msa.qkv.weight, backbone.stages.2.blocks.13.attn.w_msa.qkv.bias, backbone.stages.2.blocks.13.attn.w_msa.proj.weight, backbone.stages.2.blocks.13.attn.w_msa.proj.bias, backbone.stages.2.blocks.13.norm2.weight, backbone.stages.2.blocks.13.norm2.bias, backbone.stages.2.blocks.13.ffn.layers.0.0.weight, backbone.stages.2.blocks.13.ffn.layers.0.0.bias, backbone.stages.2.blocks.13.ffn.layers.1.weight, backbone.stages.2.blocks.13.ffn.layers.1.bias, backbone.stages.2.blocks.14.norm1.weight, backbone.stages.2.blocks.14.norm1.bias, backbone.stages.2.blocks.14.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.14.attn.w_msa.relative_position_index, backbone.stages.2.blocks.14.attn.w_msa.qkv.weight, backbone.stages.2.blocks.14.attn.w_msa.qkv.bias, backbone.stages.2.blocks.14.attn.w_msa.proj.weight, backbone.stages.2.blocks.14.attn.w_msa.proj.bias, backbone.stages.2.blocks.14.norm2.weight, backbone.stages.2.blocks.14.norm2.bias, backbone.stages.2.blocks.14.ffn.layers.0.0.weight, backbone.stages.2.blocks.14.ffn.layers.0.0.bias, backbone.stages.2.blocks.14.ffn.layers.1.weight, backbone.stages.2.blocks.14.ffn.layers.1.bias, backbone.stages.2.blocks.15.norm1.weight, backbone.stages.2.blocks.15.norm1.bias, backbone.stages.2.blocks.15.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.15.attn.w_msa.relative_position_index, backbone.stages.2.blocks.15.attn.w_msa.qkv.weight, backbone.stages.2.blocks.15.attn.w_msa.qkv.bias, backbone.stages.2.blocks.15.attn.w_msa.proj.weight, backbone.stages.2.blocks.15.attn.w_msa.proj.bias, backbone.stages.2.blocks.15.norm2.weight, backbone.stages.2.blocks.15.norm2.bias, backbone.stages.2.blocks.15.ffn.layers.0.0.weight, backbone.stages.2.blocks.15.ffn.layers.0.0.bias, backbone.stages.2.blocks.15.ffn.layers.1.weight, backbone.stages.2.blocks.15.ffn.layers.1.bias, backbone.stages.2.blocks.16.norm1.weight, backbone.stages.2.blocks.16.norm1.bias, backbone.stages.2.blocks.16.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.16.attn.w_msa.relative_position_index, backbone.stages.2.blocks.16.attn.w_msa.qkv.weight, backbone.stages.2.blocks.16.attn.w_msa.qkv.bias, backbone.stages.2.blocks.16.attn.w_msa.proj.weight, backbone.stages.2.blocks.16.attn.w_msa.proj.bias, backbone.stages.2.blocks.16.norm2.weight, backbone.stages.2.blocks.16.norm2.bias, backbone.stages.2.blocks.16.ffn.layers.0.0.weight, backbone.stages.2.blocks.16.ffn.layers.0.0.bias, backbone.stages.2.blocks.16.ffn.layers.1.weight, backbone.stages.2.blocks.16.ffn.layers.1.bias, backbone.stages.2.blocks.17.norm1.weight, backbone.stages.2.blocks.17.norm1.bias, backbone.stages.2.blocks.17.attn.w_msa.relative_position_bias_table, backbone.stages.2.blocks.17.attn.w_msa.relative_position_index, backbone.stages.2.blocks.17.attn.w_msa.qkv.weight, backbone.stages.2.blocks.17.attn.w_msa.qkv.bias, backbone.stages.2.blocks.17.attn.w_msa.proj.weight, backbone.stages.2.blocks.17.attn.w_msa.proj.bias, backbone.stages.2.blocks.17.norm2.weight, backbone.stages.2.blocks.17.norm2.bias, backbone.stages.2.blocks.17.ffn.layers.0.0.weight, backbone.stages.2.blocks.17.ffn.layers.0.0.bias, backbone.stages.2.blocks.17.ffn.layers.1.weight, backbone.stages.2.blocks.17.ffn.layers.1.bias

01/09 03:06:13 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 01/09 03:06:13 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_model/groundingdino/end2end.onnx. 01/09 03:06:13 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied 01/09 03:06:13 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/layers/transformer/utils.py:167: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! output_h = math.ceil(input_h / stride_h) /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/layers/transformer/utils.py:168: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! output_w = math.ceil(input_w / stride_w) /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/layers/transformer/utils.py:169: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pad_h = max((output_h - 1) stride_h + /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/layers/transformer/utils.py:171: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pad_w = max((output_w - 1) stride_w + /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/layers/transformer/utils.py:177: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if pad_h > 0 or pad_w > 0: /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/backbones.py:189: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert L == H W, 'input feature has wrong size' /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/backbones.py:147: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! B = int(windows.shape[0] / (H W / window_size / window_size)) /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmcv/cnn/bricks/wrappers.py:167: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if x.numel() == 0 and obsolete_torch_version(TORCH_VERSION, (1, 5)): /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/layers/transformer/utils.py:414: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert L == H * W, 'input feature has wrong size' ============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Process Process-2: Traceback (most recent call last): File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, *kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx export( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap return self.call_function(funcname, args, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function return self.call_function_local(func_name, *args, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local return pipe_caller(*args, *kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(args, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/onnx/export.py", line 138, in export torch.onnx.export( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/onnx/utils.py", line 506, in export _export( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/onnx/utils.py", line 1548, in _export graph, params_dict, torch_out = _model_to_graph( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/onnx/optimizer.py", line 27, in model_to_graphcustom_optimizer graph, params_dict, torch_out = ctx.origin_func(*args, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/onnx/utils.py", line 1113, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/onnx/utils.py", line 989, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/onnx/utils.py", line 893, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/jit/_trace.py", line 1268, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, *kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(trace_inputs)) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward result = self.forward(*input, *kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/onnx/export.py", line 123, in wrapper return forward(arg, kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/detectors/base_detr.py", line 89, in detection_transformerforward return predict_impl(self, batch_inputs, data_samples, rescale) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/core/optimizers/function_marker.py", line 266, in g rets = f(*args, **kwargs) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/detectors/base_detr.py", line 22, in predict_impl head_inputs_dict = self.forward_transformer(img_feats, data_samples) File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/detectors/grounding_dino.py", line 303, in forward_transformer encoder_inputs_dict, decoder_inputs_dict = self.pre_transformer( File "/home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdet/models/detectors/deformable_detr.py", line 152, in pre_transformer assert batch_data_samples is not None AssertionError 01/09 03:06:19 - mmengine - ERROR - /home/bowen/anaconda3/envs/yolo8/lib/python3.8/site-packages/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - mmdeploy.apis.pytorch2onnx.torch2onnx with Call id: 0 failed. exit.

Baboom-l commented 10 months ago

我用的mmdeploy，静态和动态在配置文件里写好就行。

类似这样 onnx_config = dict( type='onnx', export_params=True, keep_initializers_as_inputs=False, opset_version=11, save_file='end2end.onnx', input_names=['image','embedded','masks','position_ids','text_token_mask','positive_maps'], output_names=['boxes', 'logits'], input_shape=None, dynamic_axes={ 'embedded':{0: 'batch', 1: 'num_tonkens'}, 'masks':{0: 'batch', 1: 'num_tonkens', 2: 'num_tonkens'}, 'position_ids':{0: 'batch', 1: 'num_tonkens'}, 'text_token_mask':{0: 'batch', 1: 'num_tonkens'}, 'positive_maps':{0: 'batch', 1: 'class_nums', 2: 'token_maps'},

    'boxes': {
        0: 'batch',
        1: 'num_querys',
    },
    'logits': {
        0: 'batch',
        1: 'num_querys',
    },
},
optimize=True)

codebase_config = dict( type='mmdet', task='ObjectDetection', model_type='end2end', post_processing=dict( score_threshold=0.05, confidence_threshold=0.005, # for YOLOv3 iou_threshold=0.5, max_output_boxes_per_class=200, pre_top_k=5000, keep_top_k=100, background_label_id=-1, ))

backend_config = dict( type='tensorrt', common_config=dict(fp16_mode=True, max_workspace_size=1<<31), model_inputs = [ dict( input_shapes=dict( embedded=dict( min_shape = [1,2,768], opt_shape = [1,10,768], max_shape = [1,256,768], ), masks=dict( min_shape = [1,2,2], opt_shape = [1,10,10], max_shape = [1,256,256] ), hidden=dict( min_shape = [1,2,768], opt_shape = [1,10,768], max_shape = [1,256,768], ), position_ids=dict( min_shape = [1,2], opt_shape = [1,10], max_shape = [1,256], ), text_token_mask=dict( min_shape = [1,2], opt_shape = [1,10], max_shape = [1,256], ), positive_maps=dict( min_shape = [1,2,256], opt_shape = [1,10,256], max_shape = [1,256,256] ) ) ) ] )

Baboom-l commented 10 months ago

但转换GD需要在mmdeploy库中重写一些函数

Baboom-l commented 10 months ago

BaseBackendModel, torch2onnx两个类都需要做略微的修改

Baboom-l commented 10 months ago

tokenizer是无法转onnx的，必须拆出来

Baboom-l commented 10 months ago

@wxz1996 老哥你转的TensorRT 40ms 精度咋样，我这边转TensoRT SwinB fp32 在A100上差不多110ms，精度对的上，但fp16和int8精度都对不上

wxz1996 commented 10 months ago

@wxz1996 老哥你转的TensorRT 40ms 精度咋样，我这边转TensoRT SwinB fp32 在A100上差不多110ms，精度对的上，但fp16和int8精度都对不上

我fp32/fp16都没对齐。。。

xiyangyang99 commented 10 months ago

@wxz1996 老哥你转的TensorRT 40ms 精度咋样，我这边转TensoRT SwinB fp32 在A100上差不多110ms，精度对的上，但fp16和int8精度都对不上

我刚刚测试了torch.complie加速的方式，在3060上推理为250-260ms左右。在3090上推理为140ms。tensorrt推理出来的结果有问题，精度无法对齐。推理时间为1秒多。

wxz1996 commented 10 months ago

torch.complie

为啥我加上torch.complie没啥区别啊，你有提升吗

xiyangyang99 commented 10 months ago

这是分别加上torch.complie和不加的时候，在3060上的推理结果。相比之下。加上torch.complie快了几十毫秒。楼上的兄弟说把gd的text部分拆分出来，那就是将原模型拆分成两个onnx。然后把transformer的部分用trt加速，然后再把bert那部分也用来加速是这个意思吗？

---原始邮件--- 发件人: @.> 发送时间: 2024年1月9日(周二) 晚上8:52 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

torch.complie

为啥我加上torch.complie没啥区别啊，你有提升吗

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Baboom-l commented 10 months ago

bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了

xiyangyang99 commented 10 months ago

兄台你可以写个博客。我给你充值。

---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

wxz1996 commented 10 months ago

兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

+1

shuchang0714 commented 10 months ago

兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.**@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。

环境 tensorrt 8.6.1.6 cuda 11.7
torch转onnx

reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"},
"img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16

onnx转tensorrt

./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25

tensorrt推理 inference_trt.zip

xiyangyang99 commented 10 months ago

groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。

环境 tensorrt 8.6.1.6 cuda 11.7

torch转onnx

reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16

onnx转tensorrt

./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25

tensorrt推理 inference_trt.zip

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

shuchang0714 commented 10 months ago

groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.>

是的，把text那部分拆出来单独写逻辑输出给transformer

xiyangyang99 commented 10 months ago

我正在用你的命令行在转trt，在3090上转tensorrt中，还在转，转出来结果我等会儿告诉你。谢谢。

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.>

是的，把text那部分拆出来单独写逻辑输出给transformer

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

xiyangyang99 commented 10 months ago

我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下：

python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) * bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.>

是的，把text那部分拆出来单独写逻辑输出给transformer

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

shuchang0714 commented 10 months ago

我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) * bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

应该是你input_ids的长度没对齐吧，导出onnx的代码看看。推理的代码没改是吧

xiyangyang99 commented 10 months ago

我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_bindingshape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in token_ids = [tokenized["input_ids"][i] for i in non_zeroidx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 _@_.>; @.>;"State _@_._>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inferencetrt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.**>

应该是你input_ids的长度没对齐吧，导出onnx的代码看看。推理的代码没改是吧

我的input_ids,在export onnx的时候就是红色标记中的文本。输入是动态尺寸的，在min和max之间。推理代码中修改了自己的图像和文本。 270832395f6113362b976574acdc834f

xiyangyang99 commented 10 months ago

export成onnx的时候，这个input_ids没有改变。推理的时候也是the runing dog .

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.**>

应该是你input_ids的长度没对齐吧，导出onnx的代码看看

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

shuchang0714 commented 10 months ago

export成onnx的时候，这个input_ids没有改变。推理的时候也是the runing dog . … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 应该是你input_ids的长度没对齐吧，导出onnx的代码看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: **@.***>

代码打包来看看，有空帮你跑下

xiyangyang99 commented 10 months ago

这是我的导出脚本export_openvino.py，脚本中加载模型那里改为了model.load_state_dict(clean_state_dict(checkpoint))。pytorch模型在附件中，模型是mmdetection微调自己数据集之后的pth，caption="pressure gauge ." (自己数据集的text)，config脚本就是GroundingDINO_SwinT_OGC.py。

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午5:33 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

export成onnx的时候，这个input_ids没有改变。推理的时候也是the runing dog . … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 应该是你input_ids的长度没对齐吧，导出onnx的代码看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.*>

代码打包来看看，有空帮你跑下

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

从QQ邮箱发来的超大附件

weights.pth (1.98G, 2024年02月10日 17:47 到期)进入下载页面：https://mail.qq.com/cgi-bin/ftnExs_download?k=213961378d4f4db31b37e6251f62001d194d50030a0651064c5b04005f4f570402584c005b06041f050b54050c5b5703040b5752396932450450065f4d111c421551610a&t=exs_ftn_download&code=a9a79b22

xiyangyang99 commented 10 months ago

export成onnx的时候，这个inputids没有改变。推理的时候也是the runing dog . … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 _@_.>; @.>;"State _@_._>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 应该是你inputids的长度没对齐吧，导出onnx的代码看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @**_.***>

代码打包来看看，有空帮你跑下

GroundingDINO的模型是文本和图像同时输入的，有交叉注意力机制的计算。tokenzier拆出来不做加速只加速transformer那部分，tokenzier后加到后处理中感觉会影响结果。

formance commented 9 months ago

这是我的导出脚本export_openvino.py，脚本中加载模型那里改为了model.load_state_dict(clean_state_dict(checkpoint))。pytorch模型在附件中，模型是mmdetection微调自己数据集之后的pth，caption="pressure gauge ." (自己数据集的text)，config脚本就是GroundingDINO_SwinT_OGC.py。 … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午5:33 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) export成onnx的时候，这个input_ids没有改变。推理的时候也是the runing dog . … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 应该是你input_ids的长度没对齐吧，导出onnx的代码看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.*> 代码打包来看看，有空帮你跑下 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***> 从QQ邮箱发来的超大附件 weights.pth (1.98G, 2024年02月10日 17:47 到期)进入下载页面：https://mail.qq.com/cgi-bin/ftnExs_download?k=213961378d4f4db31b37e6251f62001d194d50030a0651064c5b04005f4f570402584c005b06041f050b54050c5b5703040b5752396932450450065f4d111c421551610a&t=exs_ftn_download&code=a9a79b22 兄弟，mmdet微调的grounding dino模型转onnx 你成功了吗，grounding dino官方的模型转onnx和tensorrt推理我都跑通了的，但是mmdet训练的还没有成功。

xiyangyang99 commented 9 months ago

你的onnx转tensorrt是怎么转的？用的工具嘛？我这边是可以微调自己的数据集的。

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年2月23日(星期五) 上午10:04 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342)

这是我的导出脚本export_openvino.py，脚本中加载模型那里改为了model.load_state_dict(clean_state_dict(checkpoint))。pytorch模型在附件中，模型是mmdetection微调自己数据集之后的pth，caption="pressure gauge ." (自己数据集的text)，config脚本就是GroundingDINO_SwinT_OGC.py。 … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午5:33 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) export成onnx的时候，这个input_ids没有改变。推理的时候也是the runing dog . … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 应该是你input_ids的长度没对齐吧，导出onnx的代码看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 代码打包来看看，有空帮你跑下 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.**> 从QQ邮箱发来的超大附件 weights.pth (1.98G, 2024年02月10日 17:47 到期)进入下载页面：https://mail.qq.com/cgi-bin/ftnExs_download?k=213961378d4f4db31b37e6251f62001d194d50030a0651064c5b04005f4f570402584c005b06041f050b54050c5b5703040b5752396932450450065f4d111c421551610a&t=exs_ftn_download&code=a9a79b22 兄弟，mmdet微调的grounding dino模型转onnx 你成功了吗，grounding dino官方的模型转onnx和tensorrt推理我都跑通了的，但是mmdet训练的还没有成功。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

formance commented 9 months ago

你的onnx转tensorrt是怎么转的？用的工具嘛？我这边是可以微调自己的数据集的。 … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年2月23日(星期五) 上午10:04 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 这是我的导出脚本export_openvino.py，脚本中加载模型那里改为了model.load_state_dict(clean_state_dict(checkpoint))。pytorch模型在附件中，模型是mmdetection微调自己数据集之后的pth，caption="pressure gauge ." (自己数据集的text)，config脚本就是GroundingDINO_SwinT_OGC.py。 … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午5:33 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) export成onnx的时候，这个input_ids没有改变。推理的时候也是the runing dog . … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午4:31 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 我用的onnx也是你分享给我的那个脚本转出来的onnx，onnx2tensorrt的用的命令也是你给的那个命令，转出来的trt推理的时候报错了，报错的日志如下： python trt_inference_on_a_image.py [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:50] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.4.1 [01/11/2024-15:58:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading trt_inference_on_a_image.py:258: DeprecationWarning: Use set_input_shape instead. context.set_binding_shape(i, opt) trt_inference_on_a_image.py:199: DeprecationWarning: Use get_tensor_shape instead. size = abs(trt.volume(context.get_binding_shape(i))) bs trt_inference_on_a_image.py:200: DeprecationWarning: Use get_tensor_dtype instead. dtype = trt.nptype(engine.get_binding_dtype(binding)) trt_inference_on_a_image.py:209: DeprecationWarning: Use get_tensor_mode instead. if engine.binding_is_input(binding): trt_inference_on_a_image.py:220: DeprecationWarning: Use execute_async_v2 instead. context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) [01/11/2024-15:58:51] [TRT] [W] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [01/11/2024-15:58:51] [TRT] [W] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. trt_inference_on_a_image.py:78: RuntimeWarning: overflow encountered in exp return 1/(1 + np.exp(-x)) Traceback (most recent call last): File "trt_inference_on_a_image.py", line 274, in <module> boxes_filt, pred_phrases = outputs_postprocess(tokenizer, output_data, box_threshold, text_threshold, with_logits=True, token_spans=None) File "trt_inference_on_a_image.py", line 143, in outputs_postprocess pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer) File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in get_phrases_from_posmap token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] File "/home/liufurui/TensorRT-8.6.1.6/targets/x86_64-linux-gnu/bin/GroundingDINO/groundingdino/util/utils.py", line 607, in <listcomp> token_ids = [tokenized["input_ids"][i] for i in non_zero_idx] IndexError: list index out of range‍ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:45 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) groundingdino中有文本部分，但是tokenzier无法转onnx的话，后面怎么进行tensorrt推理？或者是transformer那部分用tensorrt推理。text那部分拆出来单独写逻辑输出给transformer？ … ------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmdetection" @.>; 发送时间: 2024年1月11日(星期四) 下午3:36 @.>; @.>;"State @.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) 兄台你可以写个博客。我给你充值。 … ---原始邮件--- 发件人: "Chen @.> 发送时间: 2024年1月10日(周三) 上午10:03 收件人: @.>; 抄送: @.@.>; 主题: Re: [open-mmlab/mmdetection] 关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 (Issue #11342) bert模型可以放进去，但tokennizer不能放进去，我为了方便直接就把bert模型也拿出去了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.> 我这边G-DINO吧tokenizer拆除来后的动态尺寸fp32精度可以对齐，swint在a100上推理速度170ms左右，比torch推理快30%，但是fp16精度对齐不了，现在在用polygraphy debug中，有进展可以一起讨论下。环境 tensorrt 8.6.1.6 cuda 11.7 torch转onnx reference：https://github.com/wenyi5608/GroundingDINO/blob/main/demo/export_openvino.py 动态输入：每一个输入和输出中动态的维度都要注明，下标从0开始 dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_len"}, "attention_mask": {0: "batch_size", 1: "seq_len"}, "position_ids": {0: "batch_size", 1: "seq_len"}, "token_type_ids": {0: "batch_size", 1: "seq_len"}, "text_token_mask": {0: "batch_size", 1: "seq_len", 2: "seq_len"}, "img": {0: "batch_size", 2: "height", 3: "width"}, "logits": {0: "batch_size"}, "boxes": {0: "batch_size"} } opset_version:16 onnx转tensorrt ./trtexec --onnx=/root/GroundingDINO/grounded.onnx --saveEngine=grounded.trt --minShapes=img:1x3x800x1200,input_ids:1x1,attention_mask:1x1,position_ids:1x1,token_type_ids:1x1,text_token_mask:1x1x1 --optShapes=img:1x3x800x1200,input_ids:1x6,attention_mask:1x6,position_ids:1x6,token_type_ids:1x6,text_token_mask:1x6x6 --maxShapes=img:1x3x800x1200,input_ids:1x25,attention_mask:1x25,position_ids:1x25,token_type_ids:1x25,text_token_mask:1x25x25 tensorrt推理 inference_trt.zip — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 是的，把text那部分拆出来单独写逻辑输出给transformer — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 应该是你input_ids的长度没对齐吧，导出onnx的代码看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 代码打包来看看，有空帮你跑下 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.> 从QQ邮箱发来的超大附件 weights.pth (1.98G, 2024年02月10日 17:47 到期)进入下载页面：https://mail.qq.com/cgi-bin/ftnExs_download?k=213961378d4f4db31b37e6251f62001d194d50030a0651064c5b04005f4f570402584c005b06041f050b54050c5b5703040b5752396932450450065f4d111c421551610a&t=exs_ftn_download&code=a9a79b22 兄弟，mmdet微调的grounding dino模型转onnx 你成功了吗，grounding dino官方的模型转onnx和tensorrt推理我都跑通了的，但是mmdet训练的还没有成功。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: **@.***>

用的tensorrt python库，不过我用的官方没有微调过的模型，mmdet训练的groundingdino模型怎么转onnx啊

open-mmlab / mmdetection

关于mmdetection微调之后 Grounding DINO 无法tensort加速问题 #11342