Closed hhsecond closed 5 years ago
This looks to be an infinite loop in requires_grad analysis, looking into it
@eellison Thanks a lot for looking into the issue. Would you mind giving me a bit more info about the issue?
requires_grad
analysis happening? Is there any way I could stop in ScritpModule since I am using it only for inference and I don't think I can use torch.no_grad()
inside script
.when the outputs and inputs disagreed
in the PR. What do you mean by this?@hhsecond I'm landing the change now so hopefully you won't be affected shortly.
The issue was in our requires_grad analysis, we assumed that a loop output and loop input would converge to both requiring grad or not requiring it.
In the test example
def test_requires_grad_loop(self):
@torch.jit.script
def test(x, y, z):
# type: (Tensor, Tensor, int) -> Tensor
for _ in range(z):
x = y
return x
The loop input is the value of x when we enter the loop and the output is the value of x when we exit. If x requires grad but y doesn't, then the loop input will require grad but the loop output won't.
This was triggered in your example (iirc) because we set d_input to require grad on the input to the loop but not when it exited (since torch.max returns an integral tensor which can't require grad).
@hhsecond I'm landing the change now so hopefully you won't be affected shortly.
The issue was in our requires_grad analysis, we assumed that a loop output and loop input would converge to both requiring grad or not requiring it.
In the test example
def test_requires_grad_loop(self): @torch.jit.script def test(x, y, z): # type: (Tensor, Tensor, int) -> Tensor for _ in range(z): x = y return x
The loop input is the value of x when we enter the loop and the output is the value of x when we exit. If x requires grad but y doesn't, then the loop input will require grad but the loop output won't.
This was triggered in your example (iirc) because we set d_input to require grad on the input to the loop but not when it exited (since torch.max returns an integral tensor which can't require grad).
If you set d_input to not require grad before the loop i think the error would no longer happen.
Great, thanks a lot for the explanation. I did try with torch.no_grad()
but apparently, we can't do that inside the JIT yet. I haven't tried .required_grad=True
. Will give it a shot.
Hi @eellison, I tried making requires_grad=False
but apparently Aten::ones don't understand that keyword argument and raised keyword argument requires_grad unknown
. I guess your fix is the only way to go then!
@hhsecond you can also try .detach()
, I think that would work.
@eellison So I think we got misled a bit. d_input
is by default requires_grad=False
on the creation itself. The return value from the encoder and decoder has requires_grad=True
. I tried detach
ing them and did not work.
🐛 Bug
I have a program that's JITed using both script and trace. While the non-JITed version executes without any problem, JITed version gets stuck in a while loop.
To Reproduce
Below snippet is the complete script (it's a variant of PyTorch Chatbot example given in the doc). So the issue I have is with the
wrapper
function I guess. It's not returning control and getting stuck in the while loop inside thewrapper
function itselfExpected behavior
Non-JITed version returns the token in less than a second and I would expect the same from JITed version.
Environment
PyTorch version: 1.0.1.post2 Is debug build: No CUDA used to build PyTorch: None
OS: Ubuntu 18.10 GCC version: (Ubuntu 8.2.0-7ubuntu1) 8.2.0 CMake version: version 3.12.1
Python version: 3.7 Is CUDA available: No CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA
Versions of relevant libraries: [pip] numpy==1.16.2 [pip] torch==1.0.1.post2 [pip] torchvision==0.2.2 [conda] blas 1.0 mkl
[conda] mkl 2019.1 144
[conda] mkl_fft 1.0.10 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch-cpu 1.0.1 py3.7_cpu_2 pytorch [conda] torchvision-cpu 0.2.2 py_3 pytorch