mir-group / nequip

NequIP is a code for building E(3)-equivariant interatomic potentials
https://www.nature.com/articles/s41467-022-29939-5
MIT License
627 stars 140 forks source link

🐛 [BUG] NotADirectoryError when attempting to run git (#264 again) #466

Open fxcoudert opened 2 weeks ago

fxcoudert commented 2 weeks ago

Describe the bug The get_commit function generates a NotADirectoryError by calling a git subprocess with a value of cwd that is not a valid directory.

Previously reported in https://github.com/mir-group/nequip/issues/264 Apparently the fix did not work (or maybe was reverted in later code?)

To Reproduce

After compiling nequip-0.6.1 from source:

/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/__init__.py:20: UserWarning: !! PyTorch version 2.0.1 found. Upstream issues in PyTorch versions 1.13.* and 2.* have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. The best tested PyTorch version to use with CUDA devices is 1.11; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
INFO:root:Loading best_model.pth from training session...
Traceback (most recent call last):
  File "/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/bin/nequip-deploy", line 33, in <module>
    sys.exit(load_entry_point('nequip==0.6.1', 'console_scripts', 'nequip-deploy')())
  File "/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/scripts/deploy.py", line 307, in main
  File "/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/utils/versions.py", line 53, in check_code_version
  File "/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/utils/versions.py", line 47, in get_current_code_versions
  File "/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/utils/versions.py", line 47, in <dictcomp>
  File "/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/utils/git.py", line 20, in get_commit
  File "/ccc/products/python3-3.10.6/gcc--8.3.0__openmpi--4.0.1/cuda-11.7/lib/python3.10/subprocess.py", line 501, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/ccc/products/python3-3.10.6/gcc--8.3.0__openmpi--4.0.1/cuda-11.7/lib/python3.10/subprocess.py", line 969, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/ccc/products/python3-3.10.6/gcc--8.3.0__openmpi--4.0.1/cuda-11.7/lib/python3.10/subprocess.py", line 1845, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
NotADirectoryError: [Errno 20] Not a directory: '/ccc/work/cont003/gen7069/andredun/applications/nequip-0.6.1/lib/python3.10/site-packages/nequip-0.6.1-py3.10.egg/nequip/..'

Expected behavior A clear and concise description of what you expected to happen.

Environment (please complete the following information): OS: CentOS Linux release 8.4.2105 Python 3.10.6 nequip version: 0.6.1 e3nn version: 0.5.1 pytorch version: 2.0.1

cw-tan commented 2 weeks ago

Hi @fxcoudert ,

Thanks for the issue report. Unless the commits are important for your needs, feel free to comment out the offending lines in nequip/utils/versions.py and hack it to achieve what you want. The get_commit function has been removed anyway in the develop branch (d479aadd99ba0b8e2c1d0693c71d23c9bcfd4953), so this bug shouldn't re-appear in future releases.