pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.53k stars 349 forks source link

🐛 [Bug] if statements without else branches are triggering "RuntimeError: ArrayRef: invalid index Index = 0; Length = 0" #1296

Closed BrettRyland closed 1 year ago

BrettRyland commented 2 years ago

Bug Description

if statements with no else branch (or an else branch that doesn't modify the variable) are triggering the RuntimeError: ArrayRef: invalid index Index = 0; Length = 0 exception when the torch_executed_ops=["aten::size"] option is specified (not specifying it gives a warning about undefined behaviour during compilation).

This appears to be similar to, but not the same as the List.append issue with onnx that gives the same runtime exception.

This affects models using the BlurPool layer from MosaicML, for example.

To Reproduce

Repro script: trt_bug5.py

brett@br-workhorse:~/repos/Autosensor/NN$ python3 trt_bug5.py 
torch: 1.12.1+cu113, torch_tensorrt: 1.2.0a0+19ae4cbd
MyMod(
  (local_stuff): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
)
Scripting model...
Compiling trt...
WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.5.1
RecursiveScriptModule(original_name=MyMod_trt) graph(%input_0 : __torch__.___torch_mangle_0.MyMod_trt,
      %input_1 : Tensor):
  %__torch______torch_mangle_0_MyMod_trt_engine_0x55680537de50 : __torch__.torch.classes.tensorrt.Engine = prim::GetAttr[name="__torch______torch_mangle_0_MyMod_trt_engine_0x55680537de50"](%input_0)
  %3 : Tensor[] = prim::ListConstruct(%input_1)
  %4 : Tensor[] = tensorrt::execute_engine(%3, %__torch______torch_mangle_0_MyMod_trt_engine_0x55680537de50)
  %5 : Tensor = prim::ListUnpack(%4)
  %6 : int = prim::Constant[value=0]() # /home/brett/repos/Autosensor/NN/trt_bug5.py:22:23
  %7 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/trt_bug5.py:22:28
  %8 : int[] = aten::size(%5) # <string>:13:9
  %9 : int = aten::__getitem__(%8, %6) # /home/brett/repos/Autosensor/NN/trt_bug5.py:22:15
  %10 : bool = aten::gt(%9, %7) # /home/brett/repos/Autosensor/NN/trt_bug5.py:22:15
  %x.13 : Tensor = prim::If(%10)
    block0():
      %12 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/trt_bug5.py:22:28
      %13 : Tensor = aten::add(%5, %12, %12) # /home/brett/repos/Autosensor/NN/trt_bug5.py:22:6
      -> (%13)
    block1():
      -> ()
  return (%x.13)

Traceback (most recent call last):
  File "/home/brett/repos/Autosensor/NN/trt_bug5.py", line 50, in <module>
    out1 = trt(t)
  File "/home/brett/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: ArrayRef: invalid index Index = 0; Length = 0

Uncommenting the - 1 in the else branch on line 22 of the repro script doesn't trigger the exception since block1(): in the graph then returns a value.

Environment

Torch-TensorRT is compiled from the latest release/1.2 candidate (commit 19ae4cbd) and installed with pip3 install . in the main py folder.

brett@br-workhorse:~/repos/Autosensor/NN$ python3 -c 'from torch.utils import collect_env; collect_env.main()'
Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-43-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.7.99
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 515.65.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] numpy-quaternion==2022.4.1
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.12.1+cu113
[pip3] torch-optimizer==0.1.0
[pip3] torch-tensorrt==1.2.0a0+19ae4cbd
[pip3] torchmetrics==0.7.3
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect
galv commented 2 years ago

I just wanted to note that I am encountering the same issue. Thank you for the detective work and work around @BrettRyland !

BrettRyland commented 2 years ago

Additionally, replacing the

x = x + 1 if x.shape[0] > 1 else x # - 1  # Uncommenting this results in the compilation succeeding.

line in MyMod.forward in the first repro script with

if x.shape[0] > 1:  # The first by itself causes RuntimeError: ArrayRef: invalid index Index = 0; Length = 0
    return x + 1
if x.shape[1] > 1:  # The second causes a Segmentation fault during compilation
    return x + 2

causes a segmentation fault — Repro script: trt_bug5_v2.py

$ python3 tmp/trt_bug5_v2.py
torch: 1.12.1+cu116, torch_tensorrt: 1.2.0a0+ab9ba308
MyMod(
  (local_stuff): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
)
Scripting model...
Compiling trt...
WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
Segmentation fault

Here's a stack trace of the segfault:

$ gdb --ex=r --args python3 tmp/trt_bug5_v2.py
GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(No debugging symbols found in python3)
Starting program: /home/brett/repos/Autosensor/.direnv/python-venv-3.10.4/bin/python3 tmp/trt_bug5_v2.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fff835ff640 (LWP 206503)]
[New Thread 0x7fff82dfe640 (LWP 206504)]
[New Thread 0x7fff7e5fd640 (LWP 206505)]
[New Thread 0x7fff7bdfc640 (LWP 206506)]
[New Thread 0x7fff795fb640 (LWP 206507)]
[New Thread 0x7fff76dfa640 (LWP 206508)]
[New Thread 0x7fff765f9640 (LWP 206509)]
[New Thread 0x7ffef51e1640 (LWP 206515)]
torch: 1.12.1+cu116, torch_tensorrt: 1.2.0a0+ab9ba308
[New Thread 0x7ffef21ce640 (LWP 206517)]
[New Thread 0x7ffef19cd640 (LWP 206518)]
MyMod(
  (local_stuff): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
)
Scripting model...
Compiling trt...
WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
[New Thread 0x7ffef0fcc640 (LWP 206520)]
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fffb1e5e3e9 in torch::jit::Value::replaceFirstUseWith(torch::jit::Value*) () from /home/brett/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
(gdb) bt
#0  0x00007fffb1e5e3e9 in torch::jit::Value::replaceFirstUseWith(torch::jit::Value*) () from /home/brett/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#1  0x00007fffb1e5e4ab in torch::jit::Value::replaceAllUsesWith(torch::jit::Value*) () from /home/brett/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#2  0x00007ffefb2902ba in torch_tensorrt::core::AddIfBlockToGraph(std::shared_ptr<torch::jit::Graph>&, torch::jit::Node*, std::vector<std::pair<std::shared_ptr<torch::jit::Graph>, std::unordered_map<torch::jit::Value*, torch::jit::Value*, std::hash<torch::jit::Value*>, std::equal_to<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, torch::jit::Value*> > > >, std::allocator<std::pair<std::shared_ptr<torch::jit::Graph>, std::unordered_map<torch::jit::Value*, torch::jit::Value*, std::hash<torch::jit::Value*>, std::equal_to<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, torch::jit::Value*> > > > > > const&, std::unordered_map<torch::jit::Value*, torch::jit::Value*, std::hash<torch::jit::Value*>, std::equal_to<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, torch::jit::Value*> > >&) ()
   from /home/brett/.local/lib/python3.10/site-packages/torch_tensorrt/lib/libtorchtrt.so
#3  0x00007ffefb2910c9 in torch_tensorrt::core::ConstructFallbackGraph(torch::jit::Module&, torch::jit::Block*, std::unordered_map<torch::jit::Value const*, c10::IValue, std::hash<torch::jit::Value const*>, std::equal_to<torch::jit::Value const*>, std::allocator<std::pair<torch::jit::Value const* const, c10::IValue> > >, torch_tensorrt::core::CompileSpec, std::map<torch::jit::Value*, c10::IValue, std::less<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, c10::IValue> > >, std::unordered_map<torch::jit::Node*, int, std::hash<torch::jit::Node*>, std::equal_to<torch::jit::Node*>, std::allocator<std::pair<torch::jit::Node* const, int> > >&) () from /home/brett/.local/lib/python3.10/site-packages/torch_tensorrt/lib/libtorchtrt.so
#4  0x00007ffefb290e50 in torch_tensorrt::core::ConstructFallbackGraph(torch::jit::Module&, torch::jit::Block*, std::unordered_map<torch::jit::Value const*, c10::IValue, std::hash<torch::jit::Value const*>, std::equal_to<torch::jit::Value const*>, std::allocator<std::pair<torch::jit::Value const* const, c10::IValue> > >, torch_tensorrt::core::CompileSpec, std::map<torch::jit::Value*, c10::IValue, std::less<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, c10::IValue> > >, std::unordered_map<torch::jit::Node*, int, std::hash<torch::jit::Node*>, std::equal_to<torch::jit::Node*>, std::allocator<std::pair<torch::jit::Node* const, int> > >&) () from /home/brett/.local/lib/python3.10/site-packages/torch_tensorrt/lib/libtorchtrt.so
#5  0x00007ffefb2945e1 in torch_tensorrt::core::CompileGraph(torch::jit::Module const&, torch_tensorrt::core::CompileSpec) () from /home/brett/.local/lib/python3.10/site-packages/torch_tensorrt/lib/libtorchtrt.so
#6  0x00007fff5fd96b89 in torch_tensorrt::pyapi::CompileGraph (mod=..., info=...) at /storage/github/TensorRT/py/torch_tensorrt/csrc/torch_tensorrt_py.cpp:113
#7  0x00007fff5fdbff33 in pybind11::detail::argument_loader<torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&>::call_impl<torch::jit::Module, torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), 0ul, 1ul, pybind11::detail::void_type>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (f=<optimized out>, f=<optimized out>, this=0x7fffffffc980)
    at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/cast.h:2042
#8  pybind11::detail::argument_loader<torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&>::call<torch::jit::Module, pybind11::detail::void_type, torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&)>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&)) && (f=<optimized out>, this=0x7fffffffc980) at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/cast.h:2014
#9  pybind11::cpp_function::initialize<torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module, torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&, pybind11::name, pybind11::scope, pybind11::sibling, char [128]>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module (*)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [128])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (__closure=0x0, call=...) at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:192
#10 pybind11::cpp_function::initialize<torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module, torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&, pybind11::name, pybind11::scope, pybind11::sibling, char [128]>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module (*)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [128])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:170
#11 0x00007fff5fdb6afd in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=0x7fff70970ec0, kwargs_in=0x0) at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:767
#12 0x00005555556af76e in ?? ()
#13 0x00005555556a630b in _PyObject_MakeTpCall ()
#14 0x000055555569ec67 in _PyEval_EvalFrameDefault ()
#15 0x00005555556affbc in _PyFunction_Vectorcall ()
#16 0x00005555556be362 in PyObject_Call ()
#17 0x000055555569a8c4 in _PyEval_EvalFrameDefault ()
#18 0x00005555556affbc in _PyFunction_Vectorcall ()
#19 0x00005555556997d9 in _PyEval_EvalFrameDefault ()
#20 0x0000555555694cc6 in ?? ()
#21 0x0000555555789eb6 in PyEval_EvalCode ()
#22 0x00005555557b7318 in ?? ()
#23 0x00005555557b003b in ?? ()
#24 0x00005555557b7065 in ?? ()
#25 0x00005555557b6548 in _PyRun_SimpleFileObject ()
#26 0x00005555557b6243 in _PyRun_AnyFileObject ()
#27 0x00005555557a6b6e in Py_RunMain ()
#28 0x000055555577ce6d in Py_BytesMain ()
#29 0x00007ffff7c88d90 in __libc_start_call_main (main=main@entry=0x55555577ce30, argc=argc@entry=2, argv=argv@entry=0x7fffffffd628) at ../sysdeps/nptl/libc_start_call_main.h:58
#30 0x00007ffff7c88e40 in __libc_start_main_impl (main=0x55555577ce30, argc=2, argv=0x7fffffffd628, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd618) at ../csu/libc-start.c:392
#31 0x000055555577cd65 in _start ()
BrettRyland commented 2 years ago

Here's the same stacktrace, but with torch-tensorrt compiled in debug mode with pip3 install -e ./py so that it shows line numbers and extra debugging:

brett@br-workhorse:~/repos/Autosensor/NN$ gdb --ex=r --args python3 tmp/trt_bug5_v2.py
GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(No debugging symbols found in python3)
Starting program: /home/brett/repos/Autosensor/.direnv/python-venv-3.10.4/bin/python3 tmp/trt_bug5_v2.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fff835ff640 (LWP 211634)]
[New Thread 0x7fff80dfe640 (LWP 211635)]
[New Thread 0x7fff7e5fd640 (LWP 211636)]
[New Thread 0x7fff7bdfc640 (LWP 211637)]
[New Thread 0x7fff7b5fb640 (LWP 211638)]
[New Thread 0x7fff76dfa640 (LWP 211639)]
[New Thread 0x7fff745f9640 (LWP 211640)]
[New Thread 0x7ffef4c5d640 (LWP 211643)]
DEBUG: [Torch-TensorRT - Debug Build] - Runtime:
 Available CUDA Devices: 
    Device(ID: 0, Name: NVIDIA GeForce GTX 1080 Ti, SM Capability: 6.1, Type: GPU)

DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::relu(Tensor input) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::relu_(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sigmoid(Tensor input) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sigmoid_(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::tanh(Tensor input) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::tanh_(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::hardtanh(Tensor self, Scalar min_val, Scalar max_val) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::hardtanh_(Tensor self, Scalar min_val, Scalar max_val) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::prelu(Tensor self, Tensor weight) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::leaky_relu(Tensor self, Scalar negative_slope) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::leaky_relu_(Tensor self, Scalar negative_slope) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::elu(Tensor self, Scalar alpha, Scalar scale, Scalar input_scale) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::bitwise_not(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::to(Tensor self, int dtype, bool non_blocking, bool copy, int? memory_format) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::to(Tensor self, Device device, int dtype, bool non_blocking, bool copy, int? memory_format) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::to(Tensor self, Tensor other, bool non_blocking, bool copy, int? memory_format) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::to(Tensor self, Device? device, int? dtype, bool non_blocking, bool copy) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::cat(Tensor[] tensors, int dim) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for trt::const(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::constant_pad_nd(Tensor self, int[] pad, Scalar value) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::cumsum(Tensor self, int dim, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::add(Tensor self, Tensor other, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::add_(Tensor self, Tensor other, *, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::add(Tensor self, Scalar other, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::clamp(Tensor self, Scalar? min, Scalar? max) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::clamp_min(Tensor self, Scalar min) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::clamp_max(Tensor self, Scalar max) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sub(Tensor self, Tensor other, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sub(Tensor self, Scalar other, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sub_(Tensor self, Tensor other, *, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::rsub(Tensor self, Scalar other, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::rsub(Tensor self, Tensor other, Scalar alpha) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::div(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::div(Tensor self, Tensor other, *, str? rounding_mode) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::div(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::div_(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::div_(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::square(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::mul(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::mul(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::mul_(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::ne(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::ne(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::pow(Tensor self, Tensor exponent) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::pow(Tensor self, Scalar exponent) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::floor_divide(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::floor_divide(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::max(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::min(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::gt(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::gt(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::lt(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::lt(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::eq(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::eq(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::ge(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::ge(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::le(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::le(Tensor self, Scalar other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::expand(Tensor self, int[] size, *, bool implicit) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::expand_as(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::repeat(Tensor self, int[] repeats) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::repeat_interleave(Tensor self, int repeats, int? dim, *, int? output_size) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_nearest1d(Tensor self, int[] output_size, float? scales) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_nearest1d(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_nearest2d(Tensor self, int[] output_size, float? scales_h, float? scales_w) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_nearest2d(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_nearest3d(Tensor self, int[] output_size, float? scales_d, float? scales_h, float? scales_w) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_nearest3d(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_linear1d(Tensor self, int[] output_size, bool align_corners, float? scales) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_linear1d(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_bilinear2d(Tensor self, int[] output_size, bool align_corners, float? scales_h, float? scales_w) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_bilinear2d(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_trilinear3d(Tensor self, int[] output_size, bool align_corners, float? scales_d, float? scales_h, float? scales_w) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::upsample_trilinear3d(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::linear(Tensor input, Tensor weight, Tensor? bias) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih, Tensor? b_hh) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih, Tensor? b_hh) -> (Tensor, Tensor)
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::matmul(Tensor self, Tensor other) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::bmm(Tensor self, Tensor mat2) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::max(Tensor self, int dim, bool keepdim) -> (Tensor, Tensor)
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::norm(Tensor self, Scalar? p, int[] dim, bool keepdim) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::max_pool1d(Tensor self, int[] kernel_size, int[] stride, int[] padding, int[] dilation, bool ceil_mode) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::avg_pool1d(Tensor self, int[] kernel_size, int[] stride, int[] padding, bool ceil_mode, bool count_include_pad) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::max_pool2d(Tensor self, int[] kernel_size, int[] stride, int[] padding, int[] dilation, bool ceil_mode) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::avg_pool2d(Tensor self, int[] kernel_size, int[] stride, int[] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::max_pool3d(Tensor self, int[] kernel_size, int[] stride, int[] padding, int[] dilation, bool ceil_mode) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::avg_pool3d(Tensor self, int[] kernel_size, int[] stride, int[] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::adaptive_avg_pool1d(Tensor self, int[] output_size) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::adaptive_max_pool1d(Tensor self, int[] output_size) -> (Tensor, Tensor)
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::adaptive_avg_pool2d(Tensor self, int[] output_size) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::adaptive_max_pool2d(Tensor self, int[] output_size) -> (Tensor, Tensor)
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::adaptive_avg_pool3d(Tensor self, int[] output_size) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::adaptive_max_pool3d(Tensor self, int[] output_size) -> (Tensor, Tensor)
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::mean(Tensor self, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::mean(Tensor self, int[] dim, bool keepdim, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sum(Tensor self, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sum(Tensor self, int[] dim, bool keepdim, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::prod(Tensor self, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::prod(Tensor self, int dim, bool keepdim, *, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::max(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::min(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::reflection_pad2d(Tensor self, int[] padding) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::reflection_pad1d(Tensor self, int[] padding) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::replication_pad1d(Tensor self, int[] padding) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::replication_pad2d(Tensor self, int[] padding) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::replication_pad3d(Tensor self, int[] padding) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::select(Tensor self, int dim, int index) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::narrow(Tensor self, int dim, int start, int length) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::narrow(Tensor self, int dim, Tensor start, int length) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::embedding(Tensor weight, Tensor indices, int padding_idx, bool scale_grad_by_freq, bool sparse) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::roll(Tensor self, int[] shifts, int[] dims) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::index(Tensor self, Tensor?[] indices) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::slice(Tensor self, int dim, int? start, int? end, int step) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::split(Tensor self, int[] split_sizes, int dim) -> Tensor[]
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::split(Tensor self, int[] split_size, int dim) -> Tensor[]
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::split(Tensor self, int split_size, int dim) -> Tensor[]
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::split_with_sizes(Tensor self, int[] split_sizes, int dim) -> Tensor[]
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::unbind(Tensor self, int dim) -> Tensor[]
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::masked_fill(Tensor self, Tensor mask, Scalar value) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::scatter(Tensor self, int dim, Tensor index, Scalar value) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::scatter(Tensor self, int dim, Tensor index, Tensor src) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::flatten(Tensor self, int start_dim, int end_dim) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::reshape(Tensor self, int[] shape) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::view(Tensor self, int[] size) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::permute(Tensor self, int[] dims) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::transpose(Tensor self, int dim0, int dim1) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::t(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::pixel_shuffle(Tensor self, int upscale_factor) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::softmax(Tensor self, int dim, int? dtype) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::squeeze(Tensor self, int dim) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::stack(Tensor[] tensors, int dim) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::topk(Tensor self, int k, int dim, bool largest, bool sorted) -> (Tensor, Tensor)
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::abs(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::cos(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::acos(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::cosh(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sin(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::asin(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sinh(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::tan(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::atan(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::floor(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::reciprocal(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::log(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::ceil(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::sqrt(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::exp(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::neg(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::erf(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::asinh(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::acosh(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::atanh(Tensor self) -> Tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering converter for aten::unsqueeze(Tensor self, int dim) -> Tensor
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - DisentangledAttention_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomEmbLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomEmbLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomEmbLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomFCPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomGeluPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - GroupNormalizationPlugin, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RnRes2Br1Br2c_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RnRes2Br1Br2c_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RnRes2Br2bBr2c_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RnRes2Br2bBr2c_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RnRes2FullFusion_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - SingleStepLSTMPlugin, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomSkipLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomSkipLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomSkipLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomSkipLayerNormPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomQKVToContextPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomQKVToContextPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CustomQKVToContextPluginDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - DLRM_BOTTOM_MLP_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - SmallTileGEMM_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RNNTEncoderPlugin, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Interpolate, Namespace: torch_tensorrt
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - NormalizePlugin, Namespace: torch_tensorrt
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - GridAnchor_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - GridAnchorRect_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - NMS_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Reorg_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Region_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Clip_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - LReLU_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - PriorBox_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Normalize_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - ScatterND, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - RPROI_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - BatchedNMS_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - BatchedNMSDynamic_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - BatchTilePlugin_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - FlattenConcat_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CropAndResize, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CropAndResizeDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - DetectionLayer_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - EfficientNMS_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - EfficientNMS_ONNX_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - EfficientNMS_Explicit_TF_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - EfficientNMS_Implicit_TF_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - ProposalDynamic, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Proposal, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - ProposalLayer_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - PyramidROIAlign_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - ResizeNearest_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - Split, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - SpecialSlice_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - InstanceNormalization_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - InstanceNormalization_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - CoordConvAC, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - DecodeBbox3DPlugin, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - GenerateDetection_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - MultilevelCropAndResize_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - MultilevelProposeROI_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - NMSDynamic_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - PillarScatterPlugin, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - VoxelGeneratorPlugin, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Registered plugin creator - MultiscaleDeformableAttnPlugin_TRT, Namespace: 
DEBUG: [Torch-TensorRT Plugins Context] - Total number of plugins registered: 65
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::eq
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::ne
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::lt
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::gt
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::le
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::ge
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::pow
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__and__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__or__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__xor__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__round_to_zero_floordiv
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::zeros
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::ones
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::full
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::slice
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::len
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::size
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__getitem__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::append
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::extend
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::neg
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::add
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::add_
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::mul
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::sub
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::Bool
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::Float
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::Int
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__not__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__is__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__isnot__
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::numel
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::dim
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::div
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::floordiv
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::floor
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::sqrt
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::warn
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::is_floating_point
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::tensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::arange
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::clone
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::copy_
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::format
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__range_length
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for aten::__derive_index
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::Constant
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::NumToTensor
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::ListUnpack
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::ListConstruct
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::dtype
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::min
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::max
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::shape
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::TupleConstruct
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::TupleIndex
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::TupleUnpack
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::unchecked_cast
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::Uninitialized
DEBUG: [Torch-TensorRT - Debug Build] - Registering evaluator for prim::RaiseException
torch: 1.12.1+cu116, torch_tensorrt: 1.2.0a0+ab9ba308
[New Thread 0x7ffef1bce640 (LWP 211648)]
[New Thread 0x7ffef13cd640 (LWP 211649)]
MyMod(
  (local_stuff): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
)
Scripting model...
Compiling trt...
INFO: [Torch-TensorRT - Debug Build] - ir was set to default, using TorchScript as ir
DEBUG: [Torch-TensorRT - Debug Build] - TensorRT Compile Spec: {
    "Inputs": [
Input(shape=(1,32,16,16,), dtype=Unknown data type, format=Contiguous/Linear/NCHW)    ]
    "Enabled Precision": [Float, ]
    "TF32 Disabled": 0
    "Sparsity": 0
    "Refit": 0
    "Debug": 0
    "Device":  {
        "device_type": GPU
        "allow_gpu_fallback": False
        "gpu_id": 0
        "dla_core": -1
    }

    "Engine Capability": Default
    "Num Avg Timing Iters": 1
    "Workspace Size": 0
    "DLA SRAM Size": 1048576
    "DLA Local DRAM Size": 1073741824
    "DLA Global DRAM Size": 536870912
    "Truncate long and double": 1
    "Torch Fallback":  {
        "enabled": True
        "min_block_size": 3
        "forced_fallback_operators": [
            aten::size,
        ]
        "forced_fallback_modules": [
        ]
    }
}
DEBUG: [Torch-TensorRT - Debug Build] - init_compile_spec with input vector
DEBUG: [Torch-TensorRT - Debug Build] - Settings requested for Lowering:
    torch_executed_modules: [
    ]
DEBUG: [Torch-TensorRT - Debug Build] - RemoveNOPs - Note: Removing operators that have no meaning in TRT
INFO: [Torch-TensorRT - Debug Build] - Lowered Graph: graph(%x.1 : Tensor):
  %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]()
  %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]()
  %self.local_stuff.1.training : bool = prim::Constant[value=0]()
  %5 : int = prim::Constant[value=0]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:13
  %6 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %7 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %8 : int[] = prim::Constant[value=[1, 1]]()
  %112 : bool = prim::Constant[value=0]()
  %113 : int[] = prim::Constant[value=[0, 0]]()
  %114 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %8, %8, %8, %112, %113, %6, %112, %112, %112, %112)
  %9 : int[] = prim::Constant[value=[3, 3]]()
  %10 : int[] = prim::Constant[value=[2, 2]]()
  %111 : Tensor = aten::relu(%114) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17
  %x.5 : Tensor = aten::max_pool2d(%111, %9, %10, %8, %8, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11
  %101 : int[] = aten::size(%x.5) # <string>:13:9
  %103 : int = aten::__getitem__(%101, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  %105 : bool = aten::gt(%103, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  %17 : Tensor = prim::If(%105) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2
    block0():
      %18 : Tensor = aten::add(%x.5, %6, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
      -> (%18)
    block1():
      %93 : int = aten::__getitem__(%101, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %95 : bool = aten::gt(%93, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
        block0():
          %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
          -> (%22)
        block1():
          -> (%x.5)
      -> (%21)
  return (%17)

DEBUG: [Torch-TensorRT - Debug Build] - Found 1 inputs to graph
DEBUG: [Torch-TensorRT - Debug Build] - Handle input of debug name: x.1
DEBUG: [Torch-TensorRT - Debug Build] - Paring 0: x.1 : Input(shape: [1, 32, 16, 16], dtype: Float32, format: NCHW\Contiguous\Linear)
DEBUG: [Torch-TensorRT - Debug Build] - Found 1 inputs to graph
DEBUG: [Torch-TensorRT - Debug Build] - Handle input of debug name: x.1
DEBUG: [Torch-TensorRT - Debug Build] - In MapInputsAndDetermineDTypes, the g->inputs() size is 1, CollectionInputSpecMap size is1
INFO: [Torch-TensorRT - Debug Build] - Since input type is not explicitly defined, infering using first tensor calculation
  Inferred input x.1 has type Float
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %17 : Tensor = prim::If(%105) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2  block0():    %18 : Tensor = aten::add(%x.5, %6, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10    -> (%18)  block1():    %93 : int = aten::__getitem__(%101, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5    %95 : bool = aten::gt(%93, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5    %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2      block0():        %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10        -> (%22)      block1():        -> (%x.5)    -> (%21) (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2  block0():    %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10    -> (%22)  block1():    -> (%x.5) (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %self.local_stuff.1.training : bool = prim::Constant[value=0]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %5 : int = prim::Constant[value=0]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:13 (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %6 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18 (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %7 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14 (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %8 : int[] = prim::Constant[value=[1, 1]]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %112 : bool = prim::Constant[value=0]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %113 : int[] = prim::Constant[value=[0, 0]]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %9 : int[] = prim::Constant[value=[3, 3]]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %10 : int[] = prim::Constant[value=[2, 2]]() (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2  block0():    %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10    -> (%22)  block1():    -> (%x.5) (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %17 : Tensor = prim::If(%105) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2  block0():    %18 : Tensor = aten::add(%x.5, %6, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10    -> (%18)  block1():    %93 : int = aten::__getitem__(%101, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5    %95 : bool = aten::gt(%93, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5    %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2      block0():        %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10        -> (%22)      block1():        -> (%x.5)    -> (%21) (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Settings requested for Torch Fallback:
    "enabled": True
    "min_block_size": 3
    "torch_executed_operators": [
        aten::size,
     ]
DEBUG: [Torch-TensorRT - Debug Build] - Parititioning source module into PyTorch and TensorRT sub blocks
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %17 : Tensor = prim::If(%105) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2  block0():    %18 : Tensor = aten::add(%x.5, %6, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10    -> (%18)  block1():    %93 : int = aten::__getitem__(%101, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5    %95 : bool = aten::gt(%93, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5    %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2      block0():        %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10        -> (%22)      block1():        -> (%x.5)    -> (%21) (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress TensorRT block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: TensorRT

    Graph: graph(%x.1 : Tensor):
  %self.local_stuff.1.training : bool = prim::Constant[value=0]()
  %11 : int[] = prim::Constant[value=[2, 2]]()
  %10 : int[] = prim::Constant[value=[3, 3]]()
  %7 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]()
  %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]()
  %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %5, %6, %7, %5, %5, %5, %5)
  %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17
  %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - Hit a conditional statement, finializing in progress PYT block and creating a new one for the conditional
DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress Torch block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @1:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %5 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %3 : int = prim::Constant[value=0]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:13
  %0 : int[] = aten::size(%x.5) # <string>:13:9
  %2 : int = aten::__getitem__(%0, %3) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  %4 : bool = aten::gt(%2, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress Torch block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @2:
    Target: Torch

    Graph: graph(%1 : bool,
      %x.5 : Tensor,
      %6 : int[]):
  %10 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %4 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = prim::If(%1) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2
    block0():
      %2 : Tensor = aten::add(%x.5, %4, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
      -> (%2)
    block1():
      %5 : int = aten::__getitem__(%6, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %7 : bool = aten::gt(%5, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %8 : Tensor = prim::If(%7) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
        block0():
          %9 : Tensor = aten::add(%x.5, %10, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
          -> (%9)
        block1():
          -> (%x.5)
      -> (%8)
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - Registering input/output torch::jit::Value for segmented graphs
INFO: [Torch-TensorRT - Debug Build] - Partitioned Graph: [Segment Block @0:
    Target: TensorRT

    Graph: graph(%x.1 : Tensor):
  %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]()
  %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %7 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %10 : int[] = prim::Constant[value=[3, 3]]()
  %11 : int[] = prim::Constant[value=[2, 2]]()
  %self.local_stuff.1.training : bool = prim::Constant[value=0]()
  %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training)
  %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17
  %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11
  return (%x.5)

Segment Block @1:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %3 : int = prim::Constant[value=0]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:13
  %5 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : int[] = aten::size(%x.5) # <string>:13:9
  %2 : int = aten::__getitem__(%0, %3) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  %4 : bool = aten::gt(%2, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  return (%0, %4)

Segment Block @2:
    Target: Torch

    Graph: graph(%1 : bool,
      %x.5 : Tensor,
      %6 : int[]):
  %4 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %10 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %0 : Tensor = prim::If(%1) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2
    block0():
      %2 : Tensor = aten::add(%x.5, %4, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
      -> (%2)
    block1():
      %5 : int = aten::__getitem__(%6, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %7 : bool = aten::gt(%5, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %8 : Tensor = prim::If(%7) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
        block0():
          %9 : Tensor = aten::add(%x.5, %10, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
          -> (%9)
        block1():
          -> (%x.5)
      -> (%8)
  return (%0)

]
INFO: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: TensorRT

    Graph: graph(%x.1 : Tensor):
  %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]()
  %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %7 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %10 : int[] = prim::Constant[value=[3, 3]]()
  %11 : int[] = prim::Constant[value=[2, 2]]()
  %self.local_stuff.1.training : bool = prim::Constant[value=0]()
  %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training)
  %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17
  %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11
  return (%x.5)

(GraphInSegmentedBlock)

DEBUG: [Torch-TensorRT - Debug Build] - Found 1 inputs to graph
DEBUG: [Torch-TensorRT - Debug Build] - Handle input of debug name: x.1
DEBUG: [Torch-TensorRT - Debug Build] - Pairing 0: x.1: Input(shape: [1, 32, 16, 16], dtype: Float32, format: NCHW\Contiguous\Linear)
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init CUDA: CPU +187, GPU +0, now: CPU 1387, GPU 4303 (MiB)
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init builder kernel library: CPU +7, GPU +2, now: CPU 1413, GPU 4337 (MiB)
INFO: [Torch-TensorRT - Debug Build] - Settings requested for TensorRT engine:
    Enabled Precisions: Float32 
    TF32 Floating Point Computation Enabled: 1
    Truncate Long and Double: 1
    Make Refittable Engine: 0
    Debuggable Engine: 0
    GPU ID: 0
    Allow GPU Fallback (if running on DLA): 0
    Avg Timing Iterations: 1
    Max Workspace Size: 0
    DLA SRAM Size: 1048576
    DLA Local DRAM Size: 1073741824
    DLA Global DRAM Size: 536870912
    Device Type: GPU
    GPU ID: 0
    Engine Capability: standard
    Calibrator Created: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Converting Block
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - graph(%x.1 : Tensor):
  %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]()
  %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %7 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %10 : int[] = prim::Constant[value=[3, 3]]()
  %11 : int[] = prim::Constant[value=[2, 2]]()
  %self.local_stuff.1.training : bool = prim::Constant[value=0]()
  %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training)
  %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17
  %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11
  return (%x.5)

DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Input Dimension Specs: {
    x.1 : Input(shape: [1, 32, 16, 16], dtype: Float32, format: NCHW\Contiguous\Linear),}
INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Input x.1 (named: input_0): Input(shape: [1, 32, 16, 16], dtype: Float32, format: NCHW\Contiguous\Linear) in engine (conversion.AddInputs)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %self.local_stuff.0.weight_fused_bn.1 : Tensor = prim::Constant[value=<Tensor>]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be a tensor (shape [3, 32, 3, 3])
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %self.local_stuff.0.bias_fused_bn.1 : Tensor = prim::Constant[value=0.01 *  1.7118  5.2281 -4.2209 [ CUDAFloatType{3} ]]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be a tensor (shape [3])
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %4 : int[] = prim::Constant[value=[1, 1]]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: [1, 1]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %6 : int[] = prim::Constant[value=[0, 0]]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: [0, 0]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %7 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: 1
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %10 : int[] = prim::Constant[value=[3, 3]]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: [3, 3]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %11 : int[] = prim::Constant[value=[2, 2]]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: [2, 2]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %self.local_stuff.1.training : bool = prim::Constant[value=0]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: False
INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) (ctx.AddLayer)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is an already converted tensor
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT - Debug Build] - Weights: [3]
    Data Type: Float32
    Number of input maps: 3
    Number of output maps: 3
    Element shape: [1]
DEBUG: [Torch-TensorRT - Debug Build] - Weights: [3, 32, 3, 3]
    Data Type: Float32
    Number of input maps: 32
    Number of output maps: 3
    Element shape: [3,3]
DEBUG: [Torch-TensorRT - Debug Build] - Input dims: [1, 32, 16, 16]
DEBUG: [Torch-TensorRT - Debug Build] - Weights: Weights: [3, 32, 3, 3]
    Data Type: Float32
    Number of input maps: 32
    Number of output maps: 3
    Element shape: [3,3]
DEBUG: [Torch-TensorRT - Debug Build] - stride: [1, 1]
DEBUG: [Torch-TensorRT - Debug Build] - padding: [1, 1]
DEBUG: [Torch-TensorRT - Debug Build] - dilation: [1, 1]
DEBUG: [Torch-TensorRT - Debug Build] - out_padding: [0, 0]
DEBUG: [Torch-TensorRT - Debug Build] - groups: 1
DEBUG: [Torch-TensorRT - Debug Build] - Output tensor shape: [1, 3, 16, 16]
INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 (ctx.AddLayer)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is an already converted tensor
DEBUG: [Torch-TensorRT - Debug Build] - ITensor name: (Unnamed Layer* 0) [Convolution]_output
DEBUG: [Torch-TensorRT - Debug Build] - ITensor shape: [1, 3, 16, 16]
DEBUG: [Torch-TensorRT - Debug Build] - ITensor type: Float32
DEBUG: [Torch-TensorRT - Debug Build] - Output tensor shape: [1, 3, 16, 16]
INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11 (ctx.AddLayer)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is an already converted tensor
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT - Debug Build] - ITensor name: (Unnamed Layer* 1) [Activation]_output
DEBUG: [Torch-TensorRT - Debug Build] - ITensor shape: [1, 3, 16, 16]
DEBUG: [Torch-TensorRT - Debug Build] - ITensor type: Float32
DEBUG: [Torch-TensorRT - Debug Build] - kernel_size: [3, 3]
DEBUG: [Torch-TensorRT - Debug Build] - padding: [1, 1]
DEBUG: [Torch-TensorRT - Debug Build] - stride: [2, 2]
DEBUG: [Torch-TensorRT - Debug Build] - dilation: [1, 1]
WARNING: [Torch-TensorRT - Debug Build] - Dilation not used in Max pooling converter
DEBUG: [Torch-TensorRT - Debug Build] - Output tensor shape: [1, 3, 8, 8]
INFO: [Torch-TensorRT TorchScript Conversion Context] - Marking Output x.5 named output_0 in engine (ctx.MarkOutput)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Applying generic optimizations to the graph for inference.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Original: 3 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After dead-layer removal: 3 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After Myelin optimization: 3 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Applying ScaleNodes fusions.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After scale fusion: 3 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Running: ConvReluFusion on %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - ConvReluFusion: Fusing %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) with %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After dupe layer removal: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After final dead-layer removal: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After tensor merging: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After vertical fusions: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After dupe layer removal: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After final dead-layer removal: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After tensor merging: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After slice removal: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After concat removal: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Trying to split Reshape and strided tensor
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Graph construction and optimization completed in 0.00158612 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Using cublas as a tactic source
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1413, GPU 4345 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Using cuDNN as a tactic source
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuDNN: CPU +114, GPU +44, now: CPU 1527, GPU 4389 (MiB)
INFO: [Torch-TensorRT TorchScript Conversion Context] - Local timing cache in use. Profiling results in this builder pass will not be stored.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Constructing optimization profile number 0 [1/1].
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Reserving memory for host IO tensors. Host: 0 bytes
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - =============== Computing reformatting costs
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - *************** Autotuning Reformat: Float(8192,256,16,1) -> Float(8192,1,512,32) ***************
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: Optimizer Reformat(input_0 -> <out>) (Reformat)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000003e8 Time: 0.00331079
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000003ea Time: 0.0154065
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000000 Time: 0.0109203
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0x00000000000003e8 Time: 0.00331079
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - =============== Computing reformatting costs
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - *************** Autotuning Reformat: Float(768,1,48,3) -> Float(768,256,16,1) ***************
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: Optimizer Reformat((Unnamed Layer* 1) [Activation]_output -> <out>) (Reformat)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000003e8 Time: 0.0038114
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000003ea Time: 0.0148772
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000000 Time: 0.0562712
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0x00000000000003e8 Time: 0.0038114
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - =============== Computing reformatting costs
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - =============== Computing costs for 
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - *************** Autotuning format combination: Float(8192,256,16,1) -> Float(768,256,16,1) ***************
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 (CudaDepthwiseConvolution)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - CudaDepthwiseConvolution has no valid tactics for this config, skipping
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 (CudnnConvolution)
[New Thread 0x7ffef09cc640 (LWP 211658)]
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000000 Time: 0.0225287
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000001 Time: 0.0211693
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000002 Time: 0.0665031
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000004 Time: 0.0290178
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000005 Time: 0.0296684
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000006 Time: 0.0137924
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0x0000000000000006 Time: 0.0137924
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 (CaskConvolution)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x32_relu_medium_nn_v1 Tactic: 0x0ebe499388e08286
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0ebe499388e08286 Time: 0.0337237
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x32_relu_large_nn_v0 Tactic: 0x185af5379580418f
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x185af5379580418f Time: 0.0279973
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x128_relu_large_nn_v0 Tactic: 0x321f7a577fb21da0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x321f7a577fb21da0 Time: 0.0315152
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148n_nt_v1 Tactic: 0x351dd956e9e8e4c4
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x351dd956e9e8e4c4 Time: 0.0073375
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x64_relu_large_nn_v1 Tactic: 0x3c301f1cd57bd89b
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x3c301f1cd57bd89b Time: 0.0269104
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x128_relu_medium_nn_v1 Tactic: 0x3e787008e11a6129
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x3e787008e11a6129 Time: 0.0318061
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x64_relu_small_nn_v1 Tactic: 0x474c9edd1ecfbbba
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x474c9edd1ecfbbba Time: 0.021822
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x128_relu_small_nn_v0 Tactic: 0x4963fb96b4067e81
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x4963fb96b4067e81 Time: 0.0298098
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1 Tactic: 0x522ccebdb4e349f0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x522ccebdb4e349f0 Time: 0.00861402
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x64_relu_medium_nn_v1 Tactic: 0x5c38385751ccb068
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x5c38385751ccb068 Time: 0.0257575
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x32_relu_small_nn_v0 Tactic: 0x632674f65e3422ae
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x632674f65e3422ae Time: 0.023907
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 Tactic: 0x6cfa22e9365b79b6
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x6cfa22e9365b79b6 Time: 0.0078973
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x128_relu_large_nn_v1 Tactic: 0x813136e97c1542cf
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x813136e97c1542cf Time: 0.0327059
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148n_nt_v0 Tactic: 0x863395e8ea4fbbab
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x863395e8ea4fbbab Time: 0.00723064
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x128_relu_medium_nn_v0 Tactic: 0x8d563cb6e2bd3e46
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x8d563cb6e2bd3e46 Time: 0.0314502
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x64_relu_large_nn_v0 Tactic: 0x8f1e53a2d6dc87f4
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x8f1e53a2d6dc87f4 Time: 0.0332063
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x32_relu_large_nn_v1 Tactic: 0xab74b98996271ee0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xab74b98996271ee0 Time: 0.0364331
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x32_relu_medium_nn_v0 Tactic: 0xbd90052d8b47dde9
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xbd90052d8b47dde9 Time: 0.0273813
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x32_relu_small_nn_v1 Tactic: 0xd00838485d937dc1
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xd00838485d937dc1 Time: 0.0263721
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v0 Tactic: 0xdfd46e5735fc26d9
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xdfd46e5735fc26d9 Time: 0.00752474
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_mobile_relu_tile148t_nt_v0 Tactic: 0xed5bbdc3eeb5aa67
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xed5bbdc3eeb5aa67 Time: 0.0075693
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x64_relu_medium_nn_v0 Tactic: 0xef1674e9526bef07
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xef1674e9526bef07 Time: 0.0330162
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x64_relu_small_nn_v0 Tactic: 0xf462d2631d68e4d5
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xf462d2631d68e4d5 Time: 0.0294942
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_128x128_relu_small_nn_v1 Tactic: 0xfa4db728b7a121ee
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xfa4db728b7a121ee Time: 0.0296391
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0x863395e8ea4fbbab Time: 0.00723064
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 0x863395e8ea4fbbab
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - *************** Autotuning format combination: Float(8192,1,512,32) -> Float(768,1,48,3) ***************
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 (CaskConvolution)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - CaskConvolution has no valid tactics for this config, skipping
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - =============== Computing costs for 
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - *************** Autotuning format combination: Float(768,256,16,1) -> Float(192,64,8,1) ***************
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11 (TiledPooling)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000000101 Time: 0.00417987
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000010101 Time: 0.00402776
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000020101 Time: 0.00395809
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000030101 Time: 0.00348989
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000040101 Time: 0.00368881
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000050101 Time: 0.00776339
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000060101 Time: 0.00902804
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000070101 Time: 0.00841013
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000080101 Time: 0.00384613
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000090101 Time: 0.00251517
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000a0101 Time: 0.00229077
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000b0101 Time: 0.00311965
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000c0101 Time: 0.0026413
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000d0101 Time: 0.0025824
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000e0101 Time: 0.00288929
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000000f0101 Time: 0.0023652
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000100101 Time: 0.00323518
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000110101 Time: 0.00303224
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000120101 Time: 0.00295778
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000130101 Time: 0.00514002
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000140101 Time: 0.0087732
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000150101 Time: 0.00870838
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000160101 Time: 0.00865696
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000170101 Time: 0.0139516
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000180101 Time: 0.00819278
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000190101 Time: 0.00326804
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000001a0101 Time: 0.00354796
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000001b0101 Time: 0.00258199
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000001c0101 Time: 0.00229859
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000001d0101 Time: 0.00223545
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000001e0101 Time: 0.0025486
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000001f0101 Time: 0.0025444
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000200101 Time: 0.00253034
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000210101 Time: 0.00289873
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000220101 Time: 0.00262325
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000230101 Time: 0.0023499
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000240101 Time: 0.00355643
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000250101 Time: 0.00390449
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000260101 Time: 0.00762909
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000270101 Time: 0.00300896
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000280101 Time: 0.00345083
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x0000000000290101 Time: 0.00251332
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0x00000000006a0101 Time: 0.0032777
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0x00000000001d0101 Time: 0.00223545
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11 (CudnnPooling)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xffffffffffffffff Time: 0.00533469
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0xffffffffffffffff Time: 0.00533469
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11 (CaskPooling)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11 Set Tactic Name: sm50_xmma_pooling_fw_4d_FP32FP32NCHW_Max Tactic: 0xb59f9cfb90407c92
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0xb59f9cfb90407c92 Time: 0.00370319
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0xb59f9cfb90407c92 Time: 0.00370319
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - >>>>>>>>>>>>>>> Chose Runner Type: TiledPooling Tactic: 0x00000000001d0101
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Formats and tactics selection completed in 0.821927 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After reformat layers: 2 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Pre-optimized block assignment.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Block size 3072
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Block size 11718361088
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total Activation Memory: 11718364160
INFO: [Torch-TensorRT TorchScript Conversion Context] - Detected 1 inputs and 1 output network tensors.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Set Tactic Name: maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148n_nt_v0 Tactic: 0x863395e8ea4fbbab
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Layer: %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17 Host Persistent: 512 Device Persistent: 0 Scratch Memory: 0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Layer: %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11 Host Persistent: 0 Device Persistent: 0 Scratch Memory: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Host Persistent Memory: 512
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Device Persistent Memory: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Scratch Memory: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 0 MiB
INFO: [Torch-TensorRT TorchScript Conversion Context] - [BlockAssignment] Algorithm ShiftNTopDown took 0.00201ms to assign 1 blocks to 1 nodes requiring 3072 bytes.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Optimized block assignment.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Block size 3072
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Activation Memory: 3072
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Disabling unused tactic source: CUDNN
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Disabling unused tactic source: CUBLAS, CUBLAS_LT
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Engine generation completed in 0.988991 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Deleting timing cache: 4 entries, served 0 hits since creation.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Engine Layer Information:
Layer(CaskConvolution): %0 : Tensor = aten::_convolution(%x.1, %self.local_stuff.0.weight_fused_bn.1, %self.local_stuff.0.bias_fused_bn.1, %4, %4, %4, %self.local_stuff.1.training, %6, %7, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training, %self.local_stuff.1.training) + %8 : Tensor = aten::relu(%0) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:1455:17, Tactic: 0x863395e8ea4fbbab, input_0[Float(1,32,16,16)] -> (Unnamed Layer* 1) [Activation]_output[Float(1,3,16,16)]
Layer(TiledPooling): %x.5 : Tensor = aten::max_pool2d(%8, %10, %11, %4, %4, %self.local_stuff.1.training) # /home/brett/.local/lib/python3.10/site-packages/torch/nn/functional.py:782:11, Tactic: 0x00000000001d0101, (Unnamed Layer* 1) [Activation]_output[Float(1,3,16,16)] -> output_0[Float(1,3,8,8)]
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
DEBUG: [Torch-TensorRT - Debug Build] - Target Device: Device(ID: 0, Name: NVIDIA GeForce GTX 1080 Ti, SM Capability: 6.1, Type: GPU)
DEBUG: [Torch-TensorRT - Debug Build] - Setting Device(ID: 0, Name: NVIDIA GeForce GTX 1080 Ti, SM Capability: 6.1, Type: GPU) as active device
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1774, GPU 4489 (MiB)
INFO: [Torch-TensorRT - Debug Build] - Loaded engine size: 0 MiB
DEBUG: [Torch-TensorRT - Debug Build] - Deserialization required 584 microseconds.
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
DEBUG: [Torch-TensorRT - Debug Build] - Total per-runner device persistent memory is 0
DEBUG: [Torch-TensorRT - Debug Build] - Total per-runner host persistent memory is 512
DEBUG: [Torch-TensorRT - Debug Build] - Allocated activation device memory of size 3072
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
DEBUG: [Torch-TensorRT - Debug Build] - Binding name: input_0
DEBUG: [Torch-TensorRT - Debug Build] - Binding name: output_0
DEBUG: [Torch-TensorRT - Debug Build] - Torch-TensorRT TensorRT Engine:
  Name: __torch______torch_mangle_0_MyMod_trt_engine_0x55555a887870
  Inputs: [
    id: 0
      shape: [1, 32, 16, 16]
      dtype: Float
  ]
  Outputs: [
    id: 0
      shape: [1, 32, 16, 16]
      dtype: Float
  ]
  Device: Device(ID: 0, Name: NVIDIA GeForce GTX 1080 Ti, SM Capability: 6.1, Type: GPU)

DEBUG: [Torch-TensorRT - Debug Build] - graph(%self_1 : __torch__.___torch_mangle_0.MyMod_trt,
      %input_0 : Tensor):
  %__torch______torch_mangle_0_MyMod_trt_engine_0x55555a887870 : __torch__.torch.classes.tensorrt.Engine = prim::GetAttr[name="__torch______torch_mangle_0_MyMod_trt_engine_0x55555a887870"](%self_1)
  %3 : Tensor[] = prim::ListConstruct(%input_0)
  %4 : Tensor[] = tensorrt::execute_engine(%3, %__torch______torch_mangle_0_MyMod_trt_engine_0x55555a887870)
  %5 : Tensor = prim::ListUnpack(%4)
  return (%5)
(AddEngineToGraph)

INFO: [Torch-TensorRT - Debug Build] - Segment Block @1:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %3 : int = prim::Constant[value=0]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:13
  %5 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : int[] = aten::size(%x.5) # <string>:13:9
  %2 : int = aten::__getitem__(%0, %3) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  %4 : bool = aten::gt(%2, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:5
  return (%0, %4)

(GraphInSegmentedBlock)

INFO: [Torch-TensorRT - Debug Build] - Segment Block @2:
    Target: Torch

    Graph: graph(%1 : bool,
      %x.5 : Tensor,
      %6 : int[]):
  %4 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %10 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %0 : Tensor = prim::If(%1) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:2
    block0():
      %2 : Tensor = aten::add(%x.5, %4, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
      -> (%2)
    block1():
      %5 : int = aten::__getitem__(%6, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %7 : bool = aten::gt(%5, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
      %8 : Tensor = prim::If(%7) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
        block0():
          %9 : Tensor = aten::add(%x.5, %10, %4) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
          -> (%9)
        block1():
          -> (%x.5)
      -> (%8)
  return (%0)

(GraphInSegmentedBlock)

DEBUG: [Torch-TensorRT - Debug Build] - Settings requested for Torch Fallback:
    "enabled": True
    "min_block_size": 3
    "torch_executed_operators": [
        aten::size,
     ]
DEBUG: [Torch-TensorRT - Debug Build] - Parititioning source module into PyTorch and TensorRT sub blocks
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress Torch block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %2 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = aten::add(%x.5, %2, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - Registering input/output torch::jit::Value for segmented graphs
INFO: [Torch-TensorRT - Debug Build] - Partitioned Graph: [Segment Block @0:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %2 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = aten::add(%x.5, %2, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
  return (%0)

]
INFO: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %2 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = aten::add(%x.5, %2, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:23:10
  return (%0)

(GraphInSegmentedBlock)

DEBUG: [Torch-TensorRT - Debug Build] - Settings requested for Torch Fallback:
    "enabled": True
    "min_block_size": 3
    "torch_executed_operators": [
        aten::size,
     ]
DEBUG: [Torch-TensorRT - Debug Build] - Parititioning source module into PyTorch and TensorRT sub blocks
DEBUG: [Torch-TensorRT - Debug Build] - Unable to get schema for Node %21 : Tensor = prim::If(%95) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2  block0():    %22 : Tensor = aten::add(%x.5, %7, %6) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10    -> (%22)  block1():    -> (%x.5) (NodeConverterRegistry.Convertable)
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - Hit a conditional statement, finializing in progress PYT block and creating a new one for the conditional
DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress Torch block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: Torch

    Graph: graph(%1 : int[]):
  %2 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : int = aten::__getitem__(%1, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
  %3 : bool = aten::gt(%0, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress Torch block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @1:
    Target: Torch

    Graph: graph(%1 : bool,
      %x.5 : Tensor):
  %5 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %4 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %0 : Tensor = prim::If(%1) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
    block0():
      %2 : Tensor = aten::add(%x.5, %4, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
      -> (%2)
    block1():
      -> (%x.5)
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - Registering input/output torch::jit::Value for segmented graphs
INFO: [Torch-TensorRT - Debug Build] - Partitioned Graph: [Segment Block @0:
    Target: Torch

    Graph: graph(%1 : int[]):
  %2 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : int = aten::__getitem__(%1, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
  %3 : bool = aten::gt(%0, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
  return (%3)

Segment Block @1:
    Target: Torch

    Graph: graph(%1 : bool,
      %x.5 : Tensor):
  %4 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %5 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = prim::If(%1) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
    block0():
      %2 : Tensor = aten::add(%x.5, %4, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
      -> (%2)
    block1():
      -> (%x.5)
  return (%0)

]
INFO: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: Torch

    Graph: graph(%1 : int[]):
  %2 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : int = aten::__getitem__(%1, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
  %3 : bool = aten::gt(%0, %2) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:5
  return (%3)

(GraphInSegmentedBlock)

INFO: [Torch-TensorRT - Debug Build] - Segment Block @1:
    Target: Torch

    Graph: graph(%1 : bool,
      %x.5 : Tensor):
  %4 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %5 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = prim::If(%1) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:24:2
    block0():
      %2 : Tensor = aten::add(%x.5, %4, %5) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
      -> (%2)
    block1():
      -> (%x.5)
  return (%0)

(GraphInSegmentedBlock)

DEBUG: [Torch-TensorRT - Debug Build] - Settings requested for Torch Fallback:
    "enabled": True
    "min_block_size": 3
    "torch_executed_operators": [
        aten::size,
     ]
DEBUG: [Torch-TensorRT - Debug Build] - Parititioning source module into PyTorch and TensorRT sub blocks
DEBUG: [Torch-TensorRT - Debug Build] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
DEBUG: [Torch-TensorRT - Debug Build] - Finalizing in progress Torch block
DEBUG: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %3 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %2 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %0 : Tensor = aten::add(%x.5, %2, %3) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
  return ()

DEBUG: [Torch-TensorRT - Debug Build] - Registering input/output torch::jit::Value for segmented graphs
INFO: [Torch-TensorRT - Debug Build] - Partitioned Graph: [Segment Block @0:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %2 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %3 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = aten::add(%x.5, %2, %3) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
  return (%0)

]
INFO: [Torch-TensorRT - Debug Build] - Segment Block @0:
    Target: Torch

    Graph: graph(%x.5 : Tensor):
  %2 : int = prim::Constant[value=2]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:14
  %3 : int = prim::Constant[value=1]() # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:22:18
  %0 : Tensor = aten::add(%x.5, %2, %3) # /home/brett/repos/Autosensor/NN/tmp/trt_bug5_v2.py:25:10
  return (%0)

(GraphInSegmentedBlock)

DEBUG: [Torch-TensorRT - Debug Build] - Settings requested for Torch Fallback:
    "enabled": True
    "min_block_size": 3
    "torch_executed_operators": [
        aten::size,
     ]
DEBUG: [Torch-TensorRT - Debug Build] - Parititioning source module into PyTorch and TensorRT sub blocks
DEBUG: [Torch-TensorRT - Debug Build] - Registering input/output torch::jit::Value for segmented graphs
INFO: [Torch-TensorRT - Debug Build] - Partitioned Graph: []

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fffb1e5e3e9 in torch::jit::Value::replaceFirstUseWith(torch::jit::Value*) () from /home/brett/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
(gdb) bt
#0  0x00007fffb1e5e3e9 in torch::jit::Value::replaceFirstUseWith(torch::jit::Value*) () from /home/brett/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#1  0x00007fffb1e5e4ab in torch::jit::Value::replaceAllUsesWith(torch::jit::Value*) () from /home/brett/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#2  0x00007ffefaf705f9 in torch_tensorrt::core::AddIfBlockToGraph (new_g=std::shared_ptr<torch::jit::Graph> (use count 1, weak count 0) = {...}, if_node=0x55558cac5620, graph_and_mappings=std::vector of length 2, capacity 2 = {...}, old_to_new_g=std::unordered_map with 2 elements = {...})
    at core/compiler.cpp:227
#3  0x00007ffefaf7108d in torch_tensorrt::core::ConstructFallbackGraph (new_mod=..., block=0x55558caa79b0, example_tensor_map=std::unordered_map with 7 elements = {...}, cfg=..., static_params=std::map with 0 elements, fallback_nodes=std::unordered_map with 9 elements = {...})
    at core/compiler.cpp:295
#4  0x00007ffefaf70ff4 in torch_tensorrt::core::ConstructFallbackGraph (new_mod=..., block=0x55558c9f3480, example_tensor_map=std::unordered_map with 5 elements = {...}, cfg=..., static_params=std::map with 0 elements, fallback_nodes=std::unordered_map with 9 elements = {...})
    at core/compiler.cpp:293
#5  0x00007ffefaf73300 in torch_tensorrt::core::CompileGraph (mod=..., cfg=...) at core/compiler.cpp:462
#6  0x00007fff5fd96b89 in torch_tensorrt::pyapi::CompileGraph (mod=..., info=...) at /storage/github/TensorRT/py/torch_tensorrt/csrc/torch_tensorrt_py.cpp:113
#7  0x00007fff5fdbff33 in pybind11::detail::argument_loader<torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&>::call_impl<torch::jit::Module, torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), 0ul, 1ul, pybind11::detail::void_type>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (f=<optimized out>, f=<optimized out>, this=0x7fffffffc980)
    at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/cast.h:2042
#8  pybind11::detail::argument_loader<torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&>::call<torch::jit::Module, pybind11::detail::void_type, torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&)>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&)) && (f=<optimized out>, this=0x7fffffffc980) at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/cast.h:2014
#9  pybind11::cpp_function::initialize<torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module, torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&, pybind11::name, pybind11::scope, pybind11::sibling, char [128]>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module (*)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [128])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (__closure=0x0, call=...) at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:192
#10 pybind11::cpp_function::initialize<torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module, torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&, pybind11::name, pybind11::scope, pybind11::sibling, char [128]>(torch::jit::Module (*&)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), torch::jit::Module (*)(torch::jit::Module const&, torch_tensorrt::pyapi::CompileSpec&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [128])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:170
#11 0x00007fff5fdb6afd in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=0x7ffef1c2bf80, kwargs_in=0x0) at /home/brett/.local/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:767
#12 0x00005555556af76e in ?? ()
#13 0x00005555556a630b in _PyObject_MakeTpCall ()
#14 0x000055555569ec67 in _PyEval_EvalFrameDefault ()
#15 0x00005555556affbc in _PyFunction_Vectorcall ()
#16 0x00005555556be362 in PyObject_Call ()
#17 0x000055555569a8c4 in _PyEval_EvalFrameDefault ()
#18 0x00005555556affbc in _PyFunction_Vectorcall ()
#19 0x00005555556997d9 in _PyEval_EvalFrameDefault ()
#20 0x0000555555694cc6 in ?? ()
#21 0x0000555555789eb6 in PyEval_EvalCode ()
#22 0x00005555557b7318 in ?? ()
#23 0x00005555557b003b in ?? ()
#24 0x00005555557b7065 in ?? ()
#25 0x00005555557b6548 in _PyRun_SimpleFileObject ()
#26 0x00005555557b6243 in _PyRun_AnyFileObject ()
#27 0x00005555557a6b6e in Py_RunMain ()
#28 0x000055555577ce6d in Py_BytesMain ()
#29 0x00007ffff7c88d90 in __libc_start_call_main (main=main@entry=0x55555577ce30, argc=argc@entry=2, argv=argv@entry=0x7fffffffd628) at ../sysdeps/nptl/libc_start_call_main.h:58
#30 0x00007ffff7c88e40 in __libc_start_main_impl (main=0x55555577ce30, argc=2, argv=0x7fffffffd628, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd618) at ../csu/libc-start.c:392
#31 0x000055555577cd65 in _start ()
github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days