Closed raimis closed 1 year ago
@RaulPPelaez any change to finish #105?
Does anybody want to implement/finish something more?
Ping: @peastman
@RaulPPelaez any change to finish #105?
It defeated me tbh. I do not think it is worth stopping the release for it. I will give it another spin.
The test on the GPU don't pass: https://github.com/openmm/NNPOps/actions/runs/5645655178
I think both failures could be attributed to the inaccuracy introduced by the non-deterministic order of operations in the CUDA versions. The first test yields an error of 0.0343 (tolerance is 0.025) after doing a sum operation. The second has a relative error tolerance of 1e-7, which is already a bit too much for single precision
Sounds good to me.
Do we want to update the tests to make them less strict?
Maybe this is enough? https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
Both tests pass locally in a 4090 at the current master 5e2438d , perhaps the CI was just a fluke? EDIT: Also tried on a GPU with arch 75 (as the one in the CI) just in case and one of the BatchedNN tests failed also because some tolerance.
assert energy_error < 5e-7
if molFile == '3o99':
assert grad_error < 0.05 # Some numerical instability
else:
> assert grad_error < 5e-3
E AssertionError: assert tensor(0.0071, device='cuda:0') < 0.005
I do not think this is due to a bug, I would blame whatever kernel torch uses for sm_75 being a tad more inaccurate. Using deterministic algorithms does indeed make the error go away. I opened #108 to solve try in the CI.
Okay so now that #108 is merged are we ready for a new release? There have been some breaking but very good changes in conda-forge packaging for CUDA https://github.com/conda-forge/conda-forge.github.io/issues/1963 so it might take a bit to get that sorted. I will get the release on conda-forge started, but will wait to actually merge it in until we decide if the release is ready.
The changes should not affect CUDA<12 builds, right? AFAIK the new packages are only for CUDA >=12. Anyhow I say we are ready to release v0.6. cc @raimis, could you please?.
OK! I'm making the release
We have a release: https://github.com/openmm/NNPOps/releases/tag/v0.6
@mikemhenry you can proceed with https://github.com/conda-forge/nnpops-feedstock/pull/24. If CUDA 12 is an issue, we can build just for CUDA 11. And then sort out CUDA 12.
NNPOps 0.6 is out!
@jchodera could you make a tweet (or whatever is it called now)?
Let's start the new release.
New features:
103
Testing improvements:
100
104