mir-group / nequip

NequIP is a code for building E(3)-equivariant interatomic potentials
https://www.nature.com/articles/s41467-022-29939-5
MIT License
565 stars 124 forks source link

Minor Updates (`n_train/val` as percent, pre-commit, restart handling, warnings) #438

Closed kavanase closed 1 week ago

kavanase commented 2 weeks ago

Description & Motivation and Context

This PR implements some minor updates to the code:

Types of changes

Checklist:

kavanase commented 1 week ago

@Linux-cpp-lisp one minor thing to flag (doesn't really matter now, just for whenever the next release is I guess). For the changelog, it says it adheres to semantic versioning, which I think would make the next version 0.7.0 as functionality has been / is being added from this and other PRs. Just didn't want to change myself without checking as you may have other reasons for this!

kavanase commented 1 week ago

@Linux-cpp-lisp I'm not totally sure what's causing the 2 test failures here. They all pass locally. This seems to be the main issue for at least one of them:

self = CompletedProcess(args=['nequip-train', 'conf.yaml', '--warn-unused'], returncode=1, stdout=b'', stderr=b'/opt/hostedto...ate\n    raise RuntimeError(\nRuntimeError: Failed to build object with prefix `dataset` using builder `NpzDataset`\n')

    def check_returncode(self):
        """Raise CalledProcessError if the exit code is non-zero."""
        if self.returncode:
>           raise CalledProcessError(self.returncode, self.args, self.stdout,
                                     self.stderr)
E           subprocess.CalledProcessError: Command '['nequip-train', 'conf.yaml', '--warn-unused']' returned non-zero exit status 1.

/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/subprocess.py:4[60](https://github.com/mir-group/nequip/actions/runs/9699603663/job/26768963349?pr=438#step:7:61): CalledProcessError
---------------------------- Captured stderr setup -----------------------------
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/utils/_global_options.py:95: UserWarning: Do NOT manually set PYTORCH_JIT_USE_NNC_NOT_NVFUSER=0 unless you know exactly what you're doing!
  warnings.warn(
Torch device: cpu
Using existing file aspirin_ccsd.zip
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.19/x[64](https://github.com/mir-group/nequip/actions/runs/9699603663/job/26768963349?pr=438#step:7:65)/lib/python3.9/site-packages/nequip/utils/auto_init.py", line 243, in instantiate
    instance = builder(**positional_args, **final_optional_args)
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/data/_dataset/_npz_dataset.py", line 81, in __init__
    super().__init__(
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/data/_dataset/_base_datasets.py", line 152, in __init__
    super().__init__(root=root, type_mapper=type_mapper)
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/data/_dataset/_base_datasets.py", line 43, in __init__
    super().__init__(root=root, transform=type_mapper)
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/utils/torch_geometric/dataset.py", line 88, in __init__
    self._download()
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/utils/torch_geometric/dataset.py", line 149, in _download
    self.download()
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/data/_dataset/_base_datasets.py", line 197, in download
    extract_zip(download_path, self.raw_dir)
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nequip/utils/torch_geometric/utils.py", line 55, in extract_zip
    f.extractall(folder)
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/zipfile.py", line 1[65](https://github.com/mir-group/nequip/actions/runs/9699603663/job/26768963349?pr=438#step:7:66)4, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/zipfile.py", line 1704, in _extract_member
    os.mkdir(targetpath)
FileExistsError: [Errno 17] File exists: '/home/runner/work/nequip/nequip/benchmark_data/__MACOSX'

But not sure where the __MACOSX folder is coming from as the tests run on ubuntu and I don't see this in the test data zip file online

Linux-cpp-lisp commented 1 week ago

@kavanase re semantic versioning, good catch on the CHANGELOG. I've been sticking to a sort of "almost" semantic versioning where the middle digit gets incremented for backwards compatibility-breaking releases, and the final for other releases. We should discuss whether this should be changed.

Linux-cpp-lisp commented 1 week ago

@kavanase re the failed test, I believe this occurs due to a race condition: the tests on GitHub actions are run in parallel on two (if I remember right) workers, and if they both try to unzip the downloaded data at the same time (instead of one first and the other using the cache) this can occur. It should self resolve, I think the first time it came up it just wasn't worth dealing with since re-running the tests usually resolves it by chance.

kavanase commented 1 week ago

@Linux-cpp-lisp ok cool! For the failed test, was thinking it was something like that.

For the semantic versioning, that makes sense, though I think it differs a bit from the standard semver format (where compatiblity-breaking changes are meant for MAJOR versions).

Given a version number MAJOR.MINOR.PATCH, increment the: MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backward compatible manner PATCH version when you make backward compatible bug fixes

Tbh it's probably a fairly minor point and I don't think one has to stick to a given format, but I guess best to update the "this project adheres to Semantic Versioning" (2.0.0) statement in the CHANGELOG if using a different format, or swap to using its format?

kavanase commented 1 week ago

@Linux-cpp-lisp I think all the remaining issues with this have now been addressed