pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.53k stars 349 forks source link

❓ [Question] Bert lost a lot of accuracy when using fp16 #2335

Closed HenryYuen128 closed 8 months ago

HenryYuen128 commented 1 year ago

❓ Question

BERT Text Classification model run in fp16 gets huge different result compared to fp32

What you have already tried

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version: 1.3

Additional context

Model converted from TouchScript to TensorRT

enabled_precisions= {torch.half} # run with 16-bit precision  
trt_model = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions,
                                          truncate_long_and_double=True, require_full_compilation=False
                                          )

The logs

WARNING: [Torch-TensorRT] - For input input_ids.1, found user specified input dtype as Long, however when inspecting the graph, the input type expected was inferred to be Float
The compiler is going to use the user setting Long
This conflict may cause an error at runtime due to partial compilation being enabled and therefore
compatibility with PyTorch's data type convention is required.
If you do indeed see errors at runtime either:
- Remove the dtype spec for input_ids.1
- Disable partial compilation by setting require_full_compilation to True
WARNING: [Torch-TensorRT] - For input token_type_ids.1, found user specified input dtype as Long, however when inspecting the graph, the input type expected was inferred to be Float
The compiler is going to use the user setting Long
This conflict may cause an error at runtime due to partial compilation being enabled and therefore
compatibility with PyTorch's data type convention is required.
If you do indeed see errors at runtime either:
- Remove the dtype spec for token_type_ids.1
- Disable partial compilation by setting require_full_compilation to True
WARNING: [Torch-TensorRT] - For input attention_mask.1, found user specified input dtype as Long, however when inspecting the graph, the input type expected was inferred to be Double
The compiler is going to use the user setting Long
This conflict may cause an error at runtime due to partial compilation being enabled and therefore
compatibility with PyTorch's data type convention is required.
If you do indeed see errors at runtime either:
- Remove the dtype spec for attention_mask.1
- Disable partial compilation by setting require_full_compilation to True
WARNING: [Torch-TensorRT] - Data types for input tensors have been modified by inserting aten::to operations which cast INT64 inputs to INT32. To disable this, please recompile using INT32 inputs
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating intermediate graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT TorchScript Conversion Context] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size without setting allow_shape_tensors
WARNING: [Torch-TensorRT] - Sum converter disregards dtype
WARNING: [Torch-TensorRT] - Sum converter disregards dtype
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_0))
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_2))
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_1))
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_0))
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_2))
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_1))
        WARNING: [Torch-TensorRT TorchScript Conversion Context] - Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_0))
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_2))
WARNING: [Torch-TensorRT TorchScript Conversion Context] -  (# 0 (SHAPE input_1))
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Check verbose logs for the list of affected weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 131 weights are affected by this issue: Detected subnormal FP16 values.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 78 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value.
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
gs-olive commented 1 year ago

Hello - could you try compiling with Torch-TensorRT 1.4 or main, to see if the values have changed, since the TensorRT version and our converters will be updated in versions newer than 1.3. Additionally, try calling model.half() prior to compilation to see if the accuracy results change.

@narendasan - could the accuracy issue be related to the subnormal weights/overflow?

HenryYuen128 commented 1 year ago

Hello - could you try compiling with Torch-TensorRT 1.4 or main, to see if the values have changed, since the TensorRT version and our converters will be updated in versions newer than 1.3. Additionally, try calling model.half() prior to compilation to see if the accuracy results change.

@narendasan - could the accuracy issue be related to the subnormal weights/overflow?

Hi @narendasan @gs-olive, I have tried to compile with Torch-TensorRT1.4 and get the different result compared to running in FP32. When trying to call model.half() prior to compilation, raise the RuntimeError: expected scalar type Half but found Float.

The code snippet

model = torch.jit.load(path)
model.half()
model.to(device)

# The model needs to be in evaluation mode
model.eval()
# enabled_precisions= {torch.half} # run with 16-bit precision
# enabled_precisions = {torch.float32}  # run with 32-bit precision
enabled_precisions= {torch.half}

trt_model = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions,
                                                  truncate_long_and_double=True, require_full_compilation=False, debug=True
                                                  )
github-actions[bot] commented 9 months ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

nguyenvo09 commented 4 months ago

got same issue