Zero offet test failed.

lienfeng1011 commented 3 years ago

Hi. The whole installation process is fine. All check in testcpu.py is OK. But when I run testcuda.py the zero offset check function always failed. Is it neglectable? Thank you.

yzbx commented 2 years ago

I get the same error when run python testcuda.py

note: this is a known issue and may not be a serious problem.

environment

# conda list |grep pytorch
pytorch                   1.7.1           py3.8_cuda11.0.221_cudnn8.0.5_0    pytorch
pytorch-lightning         1.4.9              pyhd8ed1ab_0    conda-forge
torchvision               0.8.2                py38_cu110    pytorch
# conda list |grep cu     
cudatoolkit               11.0.221             h6bb024c_0  
icu                       58.2                 he6710b0_3  
ncurses                   6.2                  he6710b0_1  
pytorch                   1.7.1           py3.8_cuda11.0.221_cudnn8.0.5_0    pytorch
torchvision               0.8.2                py38_cu110    pytorch

error output

torch.Size([2, 64, 128, 128])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
0.971507, 1.943014
0.971507, 1.943014
Zero offset passed
/opt/conda/lib/python3.8/site-packages/torch/autograd/gradcheck.py:301: UserWarning: The {}th input requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex.
  warnings.warn(
check_gradient_dpooling: True
Traceback (most recent call last):
  File "testcuda.py", line 265, in <module>
    check_gradient_dconv()
  File "testcuda.py", line 95, in check_gradient_dconv
    gradcheck(dcn_v2_conv, (input, offset, mask, weight, bias,
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 401, in gradcheck    return not_reentrant_error()
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 398, in not_reentrant_error
    return fail_test(error_msg)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 289, in fail_test
    raise RuntimeError(msg)
RuntimeError: Backward is not reentrant, i.e., running backward with same
input and grad_output multiple times gives different values,                         although analytical gradient matches numerical gradient.                         The tolerance for nondeterminism was 0.0.

view https://github.com/CharlesShang/DCNv2/#known-issues and https://github.com/CharlesShang/DCNv2/issues/8

Update: all gradient check passes with double precision. Another issue is that it raises RuntimeError: Backward is not reentrant. However, the error is very small (<1e-7 for float <1e-15 for double), so it may not be a serious problem (?)

Zhang1n commented 6 months ago

How did this problem be solved?

lucasjinreal / DCNv2_latest

Zero offet test failed. #43