Closed gstariarch closed 9 months ago
hi, could try to upgrade tensorrt to 8.6
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.
Checklist
Describe the bug
(mmdeploy_flgpu) I:\AILab>python mmdeploy/tools/deploy.py ^ More? MMDEPLOYGPU\MMDeploy\configs\mmocr\text-recognition\text-recognition_tensorrt_dynamic-1x32x32-1x32x640.py ^ More? mmocr\configs\textrecog\sar\sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real.py ^ More? 文字识别模型部署\sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real_20220915_185451-1fd6b1fc.pth ^ More? mmocr\demo\demo_text_det.jpg ^ More? --work-dir mmdeploy_model/ocr/sar-trt ^ More? --device cuda ^ More? --dump-info 09/28 10:10:21 - mmengine - WARNING - Failed to search registry with scope "mmocr" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmocr" is a correct scope, or whether the registry is initialized. 09/28 10:10:21 - mmengine - WARNING - Failed to search registry with scope "mmocr" in the "mmocr_tasks" registry tree. As a workaround, the current "mmocr_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmocr" is a correct scope, or whether the registry is initialized. 09/28 10:10:23 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess 09/28 10:10:24 - mmengine - WARNING - Failed to search registry with scope "mmocr" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmocr" is a correct scope, or whether the registry is initialized. 09/28 10:10:24 - mmengine - WARNING - Failed to search registry with scope "mmocr" in the "mmocr_tasks" registry tree. As a workaround, the current "mmocr_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmocr" is a correct scope, or whether the registry is initialized. Loads checkpoint by local backend from path: 文字识别模型部署\sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real_20220915_185451-1fd6b1fc.pth The model and loaded state dict do not match exactly
unexpected key in source state_dict: data_preprocessor.mean, data_preprocessor.std
09/28 10:10:25 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 09/28 10:10:25 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_model/ocr/sar-trt\end2end.onnx. 09/28 10:10:26 - mmengine - WARNING - Can not find torch.nn.functional.scaled_dot_product_attention, function rewrite will not be applied 09/28 10:10:26 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_encoder.py:37: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert len(data_samples) == feat.size(0) i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_encoder.py:45: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! h_feat = int(feat.size(2)) i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_encoder.py:57: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. valid_step = torch.tensor(T valid_ratio).ceil().long() - 1 i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_encoder.py:57: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). valid_step = torch.tensor(T valid_ratio).ceil().long() - 1 i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_decoder.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert c == 1 i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_decoder.py:126: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. valid_width = torch.tensor(w valid_ratio).ceil().long() i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\codebase\mmocr\models\text_recognition\sar_decoder.py:126: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). valid_width = torch.tensor(w valid_ratio).ceil().long() i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\pytorch\functions\tensor_setitem.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! stop = stop if stop >= 0 else self_shape[i] + stop D:\miniconda3\envs\mmdeploy_flgpu\lib\site-packages\torch\onnx\symbolic_opset9.py:4315: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. warnings.warn( D:\miniconda3\envs\mmdeploy_flgpu\lib\site-packages\torch\onnx_internal\jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\shape_type_inference.cpp:1888.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) D:\miniconda3\envs\mmdeploy_flgpu\lib\site-packages\torch\onnx\utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\shape_type_inference.cpp:1888.) _C._jit_pass_onnx_graph_shape_type_inference( D:\miniconda3\envs\mmdeploy_flgpu\lib\site-packages\torch\onnx\utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\passes\onnx\shape_type_inference.cpp:1888.) _C._jit_pass_onnx_graph_shape_type_inference( 09/28 10:10:35 - mmengine - INFO - Execute onnx optimize passes. 09/28 10:10:39 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx 09/28 10:10:40 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in subprocess 09/28 10:10:40 - mmengine - INFO - Successfully loaded tensorrt plugins from i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\lib\mmdeploy_tensorrt_ops.dll [09/28/2023-10:10:41] [TRT] [I] [MemUsageChange] Init CUDA: CPU +495, GPU +0, now: CPU 12001, GPU 1269 (MiB) [09/28/2023-10:10:42] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +387, GPU +104, now: CPU 12575, GPU 1373 (MiB) [libprotobuf WARNING E:\Perforce\rboissel_devdt_windows\sw\gpgpu\MachineLearning\DIT\dev\nvmake\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING E:\Perforce\rboissel_devdt_windows\sw\gpgpu\MachineLearning\DIT\dev\nvmake\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:81] The total number of bytes read was 734290051 [09/28/2023-10:10:42] [TRT] [I] ---------------------------------------------------------------- [09/28/2023-10:10:42] [TRT] [I] Input filename: mmdeploy_model/ocr/sar-trt\end2end.onnx [09/28/2023-10:10:42] [TRT] [I] ONNX IR version: 0.0.6 [09/28/2023-10:10:42] [TRT] [I] Opset version: 11 [09/28/2023-10:10:42] [TRT] [I] Producer name: pytorch [09/28/2023-10:10:42] [TRT] [I] Producer version: 1.13.0 [09/28/2023-10:10:42] [TRT] [I] Domain: [09/28/2023-10:10:42] [TRT] [I] Model version: 0 [09/28/2023-10:10:42] [TRT] [I] Doc string: [09/28/2023-10:10:42] [TRT] [I] ---------------------------------------------------------------- [libprotobuf WARNING E:\Perforce\rboissel_devdt_windows\sw\gpgpu\MachineLearning\DIT\dev\nvmake\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING E:\Perforce\rboissel_devdt_windows\sw\gpgpu\MachineLearning\DIT\dev\nvmake\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:81] The total number of bytes read was 734290051 [09/28/2023-10:10:42] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/28/2023-10:10:42] [TRT] [W] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped [09/28/2023-10:10:42] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:43] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:44] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:44] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:44] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:44] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:44] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:45] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:45] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:45] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:45] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:46] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:46] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:46] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:47] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:47] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:47] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:48] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:48] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:48] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:49] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/28/2023-10:10:50] [TRT] [E] 4: [network.cpp::nvinfer1::Network::validate::3008] Error Code 4: Internal Error (input: for dimension number 1 in profile 0 does not match network definition (got min=1, opt=1, max=1), expected min=opt=max=3).) [09/28/2023-10:10:50] [TRT] [E] 2: [builder.cpp::nvinfer1::builder::Builder::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) Process Process-3: Traceback (most recent call last): File "D:\miniconda3\envs\mmdeploy_flgpu\lib\multiprocessing\process.py", line 315, in _bootstrap self.run() File "D:\miniconda3\envs\mmdeploy_flgpu\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, *self._kwargs) File "i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\apis\core\pipeline_manager.py", line 107, in call ret = func(args, **kwargs) File "i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\apis\utils\utils.py", line 98, in to_backend return backend_mgr.to_backend( File "i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\backend\tensorrt\backend_manager.py", line 127, in to_backend onnx2tensorrt( File "i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\backend\tensorrt\onnx2tensorrt.py", line 79, in onnx2tensorrt from_onnx( File "i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\backend\tensorrt\utils.py", line 248, in from_onnx assert engine is not None, 'Failed to create TensorRT engine' AssertionError: Failed to create TensorRT engine 09/28 10:10:50 - mmengine - ERROR - i:\ailab\mmdeploygpu\mmdeploy\mmdeploy\apis\core\pipeline_manager.py - pop_mp_output - 80 -
mmdeploy.apis.utils.utils.to_backend
with Call id: 1 failed. exit.Reproduction
(mmdeploy_flgpu) I:\AILab>python mmdeploy/tools/deploy.py ^ More? MMDEPLOYGPU\MMDeploy\configs\mmocr\text-recognition\text-recognition_tensorrt_dynamic-1x32x32-1x32x640.py ^ More? mmocr\configs\textrecog\sar\sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real.py ^ More? 文字识别模型部署\sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real_20220915_185451-1fd6b1fc.pth ^ More? mmocr\demo\demo_text_det.jpg ^ More? --work-dir mmdeploy_model/ocr/sar-trt ^ More? --device cuda ^ More? --dump-info
Environment
Error traceback
No response