Closed Hukongtao closed 1 month ago
Thanks for the repro. I've fixed this bug in this PR : https://github.com/pytorch/TensorRT/pull/3019
thank you for your reply~ I use the latest version and modify the code according to your PR, and I got another error:
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
WARNING:torch_tensorrt.dynamo.conversion.converter_utils:Detected unparsable type in node formatting: <class 'torch.SymInt'>
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.039768
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 1 inputs and 6 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 5552
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 48365568
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
INFO:torch_tensorrt [TensorRT Conversion Context]:[BlockAssignment] Algorithm ShiftNTopDown took 0.031924ms to assign 2 blocks to 4 nodes requiring 61210624 bytes.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 61210624
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 10853632
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.123574 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 100 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 9363 MiB
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.135388
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 11179132 bytes of Memory
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 1496 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 157388 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 16 timing cache entries
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
Traceback (most recent call last):
File "/mnt/bn/hukongtao-infer-speed/mlx/users/kongtao.hu/codebase/EasyGuard_0617/speed_vit_test.py", line 17, in <module>
trt_gm = torch_tensorrt.compile(model, "dynamo", inputs)
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/_compile.py", line 249, in compile
trt_graph_module = dynamo_compile(
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 243, in compile
trt_gm = compile_module(gm, inputs, settings)
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/_compiler.py", line 383, in compile_module
submodule_inputs = partitioning.construct_submodule_inputs(submodule)
File "/usr/local/lib/python3.9/dist-packages/torch_tensorrt/dynamo/partitioning/common.py", line 124, in construct_submodule_inputs
raise AssertionError(
AssertionError: Input scaled_dot_product_attention does not contain metadata. Please ensure you have exported the graph correctly
Looking forward to your reply
@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : https://github.com/pytorch/TensorRT/pull/3019. Can you give it a try ?
@Hukongtao This error is because our lowering pass was not copying over the metadata of the attention op to its replaced variant. I've pushed a fix now to the same PR : #3019. Can you give it a try ?
LGTM
Bug Description
To Reproduce
Minimal reproducible code:
Expected behavior
Model should compile with Dynamic shapes. But I got error:
Environment
Additional context
Reference official documentation:
https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html