Open orioninthesky98 opened 2 months ago
Hi @orioninthesky98 thanks for the details.
I'm able to get the same results of torch_tensorrt
and pytorch
models by using the repro (a little changes) you gave:
Here's what I did: 1) uncomment this line (otherwise there's an type error): https://gist.github.com/orioninthesky98/d0a987197950bc0b945d28b240d5bc53#file-model-py-L342 I didn't touch other codes.
2) Run the inference code:
encoder = FinalEncoder().to("cuda")
encoder.eval()
minibatch_size = 1024
net_input_shape = (1, 1, 1, 40)
x_rand = torch.rand((minibatch_size,) + tuple(net_input_shape))
x_rand = x_rand.to("cuda")
trt_model = torch_tensorrt.compile(
encoder,
inputs=[x_rand],
enabled_precisions={torch.float32},
optimization_level=5,
use_fast_partitioner=True,
dynamic=False,
disable_tf32=True,
)
print("==================== trt_model mu ====================")
print(trt_model(x_rand)[0])
print("==================== torch_model mu ====================")
print(encoder(x_rand)[0])
Then I can get the same results.
For your reference, here's my env:
tensorrt 10.0.1
torch 2.5.0.dev20240703+cu121
torch_tensorrt 2.5.0.dev0+feb4d84ff (main branch as of today)
torchvision 0.20.0.dev20240703+cu121
I recommend you using the latest Torch-TRT main branch to test again. Please let me know if you still get the same issue.
Bug Description
my model outputs a tuple of
mu
andlogvar
. for themu
, there are 4 columns (features), consisting of 3 features of type A and 1 feature of type B. you can see the FinalEncoder.forward() code in the gist below for the details.as sene below, for 3 features of type A, only the first feature matches the pytorch model. the 2nd and 3rd features are total garbage. for the type B feature, it matches the pytorch model.
this used to work perfectly fine on the previous version of torch-TensorRT (2.2.0) before I updated to 2.3.0. in fact, if you look at the model code, i had to write the
trt_compat_mode
specially for 2.3.0. When I was using 2.2.0, the original pytorch forward() actually compiled fine and gave the expected speedups (4 to 5 times)torch mu
tensorRT mu, 2nd & 3rd column is wrong
To Reproduce
Steps to reproduce the behavior:
this is the model code https://gist.github.com/orioninthesky98/d0a987197950bc0b945d28b240d5bc53#file-model-py-L327-L352 the problematic part is highlighted in the gist. you can see the for-loop here and somehow only the 1st feature (
inv_mu
/inv_logvar
) is correct but the remaining 2 are garbagei've tried unrolling the loop myself (so hardcoding the indices provided into
torch.index_select()
just in case there was something wrong when tracing the for-loop. it still didn't fix the issue.i tried to do stuff with torch._constrain_as_size(bs or num_inv_feats) but didn't find success as torch complained that those are not of type SymInt.
i have also tried changing all the .view() to .reshape() but that didn't change anything. i tried adding .clone(), .contiguous() and that didn't help either.
also something weird is that I was forced to use
torch.index_select()
. previously in torch_tensorrt 2.2.0, I could do plain slice-indexing and it compiled just fine, something likecurr_input = masked_input[:, i, ...]
.i tried to revert to torch_tensorrt 2.2.0, but very strangely, it rejects the use of
torch.index_select()
lol! with 2.2.0, i have to settrt_compat_mode=False
, and then it compiles fine, AND it gives the correct outputsfor the compilation I am using this code:
Expected behavior
compiled model outputs need to match torch model outputs, at least in approximation
Environment
conda
,pip
,libtorch
, source): pipAdditional context