openmm / NNPOps

High-performance operations for neural network potentials
Other
79 stars 17 forks source link

TestNeighbors.py failures with pytorch 1.13 #84

Closed sef43 closed 1 year ago

sef43 commented 1 year ago

If I install NNPOps with pytorch 13

conda install -c conda-forge nnpops pytorch=1.13

Or build from source with pytorch 1.13 then the tests in TestNeighbors.py fail with a pytorch runtime error, e.g.:

FAILED TestNeighbors.py::test_neighbor_grads[distances-1-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of NormBackward1, is at version 1;...

Output of

pytest TestNeighbors.py
``` =================================================================================== short test summary info ==================================================================================== FAILED TestNeighbors.py::test_neighbor_grads[distances-1-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[distances-1-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [0]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[distances-2-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[distances-2-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[distances-3-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[distances-3-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [3]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[distances-4-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [6]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[distances-4-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [6]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[distances-5-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[distances-5-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [10]], which is output 0 of NormBackward1, is at version ... FAILED TestNeighbors.py::test_neighbor_grads[distances-10-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [45]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[distances-10-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [45]], which is output 0 of NormBackward1, is at version ... FAILED TestNeighbors.py::test_neighbor_grads[distances-100-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4950]], which is output 0 of NormBackward1, is at version... FAILED TestNeighbors.py::test_neighbor_grads[distances-100-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [4950]], which is output 0 of NormBackward1, is at versio... FAILED TestNeighbors.py::test_neighbor_grads[distances-1000-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [499500]], which is output 0 of NormBackward1, is at versi... FAILED TestNeighbors.py::test_neighbor_grads[distances-1000-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [499500]], which is output 0 of NormBackward1, is at vers... FAILED TestNeighbors.py::test_neighbor_grads[combined-1-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[combined-1-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [0]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[combined-2-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[combined-2-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[combined-3-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[combined-3-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [3]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[combined-4-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [6]], which is output 0 of NormBackward1, is at version 1;... FAILED TestNeighbors.py::test_neighbor_grads[combined-4-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [6]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[combined-5-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[combined-5-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [10]], which is output 0 of NormBackward1, is at version ... FAILED TestNeighbors.py::test_neighbor_grads[combined-10-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [45]], which is output 0 of NormBackward1, is at version 1... FAILED TestNeighbors.py::test_neighbor_grads[combined-10-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [45]], which is output 0 of NormBackward1, is at version ... FAILED TestNeighbors.py::test_neighbor_grads[combined-100-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4950]], which is output 0 of NormBackward1, is at version... FAILED TestNeighbors.py::test_neighbor_grads[combined-100-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [4950]], which is output 0 of NormBackward1, is at versio... FAILED TestNeighbors.py::test_neighbor_grads[combined-1000-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [499500]], which is output 0 of NormBackward1, is at versi... FAILED TestNeighbors.py::test_neighbor_grads[combined-1000-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [499500]], which is output 0 of NormBackward1, is at vers... ========================================================================== 32 failed, 214 passed, 2 warnings in 8.07s ========================================================================== ```

The test passes with pytorch=1.12

Is anyone else able to reproduce? running on Linux with Cuda 11.7

RaulPPelaez commented 1 year ago

I am compiling from source and I can reproduce this, also using Linux with CUDA 11.7.

RaulPPelaez commented 1 year ago

Torch suggests running the offending code with the following defined:

    pt.autograd.set_detect_anomaly(True)

Running the test with that marks the following lines as offending: https://github.com/openmm/NNPOps/blob/16543f913b230363409986875a5f479708bf24d0/src/pytorch/neighbors/TestNeighbors.py#L122 https://github.com/openmm/NNPOps/blob/16543f913b230363409986875a5f479708bf24d0/src/pytorch/neighbors/TestNeighbors.py#L125 Is this meaningful to you? @sef43 @raimis

sef43 commented 1 year ago

this only happens for the NNPOps CPU implementation, the changes in #91 seem to fix it