Closed Mark-M2L closed 2 years ago
@Mark-M2L could you please try if this works? https://github.com/pytorch/TensorRT/issues/983
Thanks a lot for your reply. I have taken a look into it and have a question about this implementation. I am thinking of two ways of implementing the possible solution you suggest. The first is to create a scripted model using torch.jit.script(model)
and then feed this scripted model to the code written in remove_exceptions.cpp to generate a new graph. But in that case, should I create a scripted model again, which can be fed into PyTorch and then use torch_tensorrt to compile this scripted model?
The second way I'm thinking of, is to rewrite the code in remove_exceptions.cpp to Python code such that we can directly remove the exceptions with Python. After removing the exceptions, we can feed this scripted model to the compiler of torch_tensorrt.
Would you suggest to use any of these implementations, or would you suggest something different?
Hey @Mark-M2L, sorry I didn't make it clear.
I think what you can do is simply add torch::jit::exception_elimination
right after this line: https://github.com/pytorch/TensorRT/blob/5d1acbacb3928c7d5b1f125cf8fe98c9bbaffbeb/core/lowering/lowering.cpp#L35.
Then recompile Torch-TensorRT, it should be good.
Thanks for your clarification. I recompiled Torch-TensorRT using your suggestion (previously built it using Python, as explained in https://github.com/pytorch/TensorRT/issues/1026#issuecomment-1119561746). The compilation went fine, but adding your step and then recompiling, I still got the same error (Expected Tensor but got Uninitialized). What versions of CUDA, cuDNN, TensorRT, and torch_tensorrt did you use to get your code to compile? Perhaps I have used the wrong versions.
Hey @Mark-M2L, do you have a small repro so I can also run and test locally? Btw, we see this error previously when we have this kind of operations:
if a:
do something;
else:
raiseException
This happens because when we have a If node like this, we do it in fallback and there is no raiseException corresponded value, as explained here #983.
Do you have detailed logs about the part that fails? The graph etc.
I think when you are doing c = a + b
there is a check and if the shape doesn't match it would throw out an exception.
Hi @bowang007 Thank you very much for your help. I created a small reproducible repo at Reproduced repo. It is basically a copy of DDRNet.pytorch, but modified such that it compiles with torch_tensorrt (in the future).
For this repo, I use Python 3.8.13. The packages that are installed are:
Regarding the logs where it fails, it seems that the RuntimeError is suddenly thrown. I have tried the logging with both Debug and Graph (logging with Error does not print anything outside the RuntimeError). Some things that I see are:
This one could indicate that the shape is wrong, however it also seems that it happens in layer1, line 325 of ddrnet_23_slim.py. The part where we apply c = a + b
occurs later, right before the final layer. The code for c = a + b
is written in line 202, in class DAPPM
. So, I doubt this is the error that leads to the final error being thrown.
Other suspect debug lines are in the following lines:
However, they do not seem to lead to the RuntimeError and do not seem to interrupt the program.
Does this give you enough information? Of course, I can provide you with more information if requested. Thanks a lot for your help.
Hi @bowang007, did you perhaps have time to test the sample code? Would be really helpful to us if you can help us out.
Hey @Mark-M2L , I'm going to have a test this week. Stuck on something else last week. Will update to you soon.
Hey @Mark-M2L I run your model locally and I have this bug https://github.com/pytorch/TensorRT/issues/1336. Seems like it's because I'm using latest Torch-TensorRT version while you are using 1.10, and there are some changes since 1.10.
I'm now trying to support your model together with this pr: https://github.com/pytorch/TensorRT/pull/1263. Hopefully this could be completed this week. I will reply to you once your model is supported.
Hi @bowang007, thank you very much for taking your time to support the model. I really appreciate it. Looking forwards to your update :)
Hey @Mark-M2L, sorry I forgot to reply earlier. Could you please try this 2 PRs #1263 , #1345 ? I tested locally and your model is supported using these 2 PRs. The result is also good. Please update and close this issue if you succeed. Thanks.
Hi @bowang007 Thanks a lot! So you suggest that one of the two PRs worked for you? Then I will start testing one of them as soon as I have time. If it succeeds I will notify here and close the issue.
@Mark-M2L All the 2 PRs mentioned above are merged into master branch, so your model should be supported by master branch now. I'm closing this issue.
❓ Question
Currently, I am compiling a custom segmentation model using torch_tensorrt.compile(), using a model script obtained from jit. The code to compile is as follows:
The code fails to compile at the following step:
, throwing the following error:
It seems that some variable is uninitialized. However, the strange thing is that replacing the previous code with the following code pieces both compile:
and
So, somehow taking the sum of these two tensors results in a failure to compile. Do you have any suggestions I can try such that this step compiles as well?
What you have already tried
Tried adding the following two parameters to the compilation step as well:
, but these resulted in different errors, thus I decided not to use these parameters for now.
Environment
conda
,pip
,libtorch
, source): pip, from within a virtual environment (pyenv)Looking forwards to your answer, thanks in advance.