Closed henrycharlesworth closed 1 year ago
@peri044 Can you take a look at this?
Might be related to #898? @henrycharlesworth Can you try building and running Torch-TensorRT with the TensorRT NGC containers?
I was able to bypass my issue using nvcr.io/nvidia/tensorrt:22.02-py3
, PyTorch 1.10, and Torch-TensorRT commit 11bcb98d
on master.
I'm likewise getting a segmentation fault in torch_tensorrt.compile
when trying to convert a model to int8. The issue does not occur with float16 or float32. I haven't tried building from source yet with debugging symbols, but gdb
tracked it to libtorchtrt.so
and torch_tensorrt::core::MapInputsAndDetermineDTypes()
.
I'm on Torch 1.11.0+cu115, Torch-TensorRT 1.1.0.
I traced the segfault in my case to line 314 here: https://github.com/pytorch/TensorRT/blob/40f8b44d95e1bf0912757377eb6acba666963e9d/core/compiler.cpp#L311-L316
As far as I can tell, first_use_type_map
lacks the key in
, and as a result doing ->second
on the result of .find(in)
is invalid. The code appears to be trying to check for this case, but that point it's too late.
I have a very hasty patch that gets me past this point (I'll do a PR if anyone wants, but I don't know if I'm actually solving much), but it then just leads me to https://github.com/pytorch/TensorRT/issues/922.
diff --git a/core/compiler.cpp b/core/compiler.cpp
index b684b808..0d82bf11 100644
--- a/core/compiler.cpp
+++ b/core/compiler.cpp
@@ -311,8 +311,9 @@ void MapInputsAndDetermineDTypes(
for (auto& in : g->inputs()) {
if (static_params.find(in) == static_params.end()) {
ir::Input& spec = cfg.convert_info.inputs.find(in)->second;
- auto est_type_opt = first_use_type_map.find(in)->second;
- if (est_type_opt && !spec.dtype_is_user_defined) {
+ auto count = first_use_type_map.count(in);
+ if (count && !spec.dtype_is_user_defined) {
+ auto est_type_opt = first_use_type_map.find(in)->second;
// If we can calculate the type from the graph and the type was not defined by the user then use the calculated
// type
LOG_INFO(
@@ -320,17 +321,18 @@ void MapInputsAndDetermineDTypes(
<< in->debugName() << " has type " << est_type_opt.value()
<< ". If this is incorrect explicitly set dtype for input and file a bug");
spec.dtype = util::ScalarTypeToTRTDataType(est_type_opt.value());
- } else if (!est_type_opt && !spec.dtype_is_user_defined) {
+ } else if (!count && !spec.dtype_is_user_defined) {
// If we cannot calculate the type and the user did not define the type, then default to FP32
LOG_WARNING(
"Cannot infer input type from calcuations in graph for input "
<< in->debugName() << ". Assuming it is Float32. If not, specify input type explicity");
spec.dtype = nvinfer1::DataType::kFLOAT;
} else if (spec.dtype_is_user_defined && cfg.partition_info.enabled) {
- if (!est_type_opt) {
+ if (!count) {
LOG_INFO("Cannot infer input tensor dtype in graph. Using user provided input dtype settings");
first_use_type_map[in] = {util::TRTDataTypeToScalarType(cfg.convert_info.inputs.find(in)->second.dtype)};
} else {
+ auto est_type_opt = first_use_type_map.find(in)->second;
if (util::TRTDataTypeToScalarType(cfg.convert_info.inputs.find(in)->second.dtype) != est_type_opt.value()) {
std::stringstream ss;
ss << "For input " << in->debugName() << ", found user specified input dtype as ";
Hi,
We faced the int8 bug too, in the official docker image, version 22.05. The initial issue is solved with @Hodapp87's patch, but in our case it leads to another issue, not #922 as reported by @Hodapp87.
This is the exception traceback:
Traceback (most recent call last):
File "./main.py", line 19, in <module>
trt_ts_module = torch_tensorrt.compile(
File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/_compile.py", line 109, in compile
return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/ts/_compiler.py", line 113, in compile
compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: [Error thrown at core/conversion/var/Var.cpp:132] Expected isITensor() to be true but got false
Requested ITensor from Var, however Var type is c10::IValue
Anyone knows how to solve this?
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
@peri044 Any news on this issue? We still cannot use our models with int8 precision because of this bug.
@peri044 can we please confirm the PTQ notebook is working properly, then go after this bug? P1
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
We think this is fixed. Dheeraj to check.
Thanks, I will check as soon as possible.
@ivan94fi have you been able to check? We would like to close this out.
Hi, I can confirm that our model is now correctly converted when using int8 precision with version 1.3.0 of Torch-TensorRT. Thank you!
Bug Description
I'm using torch_tensorrt to try and quantize a pretrained ResNet50 model (roughly following the steps here), but I am getting a segmentation fault. I've tried running the code on two different machines using the latest docker image here but get the same segmentation fault on both. Also, when I try to compile the model to TensorRT with fp16 instead of quantizing it works fine.
To Reproduce
Reduced example code:
main.py:
with utils.py having:
When I run main.py when it gets to the point of compiling the model I get:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
If I remove
quant_modules.initialize()
andcalibrate_model(...)
and change the enabled precision instead totorch.float16
the model compiles without any error.Expected behavior
Would expect the int8 quantized model to compile without issue.
Environment
Using the nvidia docker image (22.02-py3). Tested on a GTX-3090 GPU and GTX 1650.
Additional context