Closed jungsdao closed 4 months ago
They keep on changing the way DFT calculators get initialized. I was pretty sure it was working with all the different ways Espresso was initialized. How exactly did you install ASE when it wasn't working? The version number isn't sufficient, because they keep making changes without changing the version number, at least in the gitlab version.
The way I installed ASE when it didn't work was :
pip install --upgrade git+https://gitlab.com/ase/ase.git@master
Thanks. Let me see if I can reproduce the problem. I assume you're also using the latest version of wfl
?
Yes, I'm also using the latest version of wfl
. (v 0.2.0)
I just tried with the latest ASE master branch (and the latest wfl
main branch), and the Espresso-related tests passed. If you clone the wfl
repo, you should be able to do (from the cloned directory)
pytest --basetemp ${HOME}/pytest_wfl -rxXs tests/calculators/test_qe.py
after setting the environment variable PYTEST_WFL_ASE_ESPRESSO_COMMAND
to the command that run a serial pw.x
(I use mpirun -np 1 pw.x
for example). If that fails, we need to figure out why, since it's passing for me. If it passes, but your real script fails, we should be able to figure out why.
The way I installed ASE when it didn't work was :
pip install --upgrade git+https://gitlab.com/ase/ase.git@master
This might be best put in the docs, or getting the ASE devs to finally make a release (3.22 was released in 2021) because if you install according to the wfl
docs then you are seeing the same even with importing from wfl.generate.optimize import optimize
.
I'm confused - that git command above did work, or didn't? It looks like the command that should give the latest, which should work.
@stenczelt As a person who ran into this, where do you think it should be documented so it's most likely to be noticed?
Top level README.md
? Anyplace else? I guess the install command in the docs could, in principle drag in the older and incompatible ASE (although I was sort of assuming people had their own ASE already installed). I think there's a beta release number - we could require that as the minimum version, which will always fail until they actually have another release, but at least you'll know you have to do it manually.
Sorry for belated reply.
I have checked again and now I found the point where it can be reproduced.
This happens when Quantum espresso job is submitted to remote cluster and the ASE version installed in the cluster is 3.23.0b1
. I think pytest in current wfl passed without error probably because it does not submit remote job and tested only locally. When I downgrade espresso.py in the cluster to older version ( like 3.22.1
), I don't get this error.
If you have the latest wfl and ASE (github master HEAD) on both local and remote machines, then it should definitely work.
I also thought it should work with the older version, actually, so I'll also check why it's not.
@jungsdao I just ran the wfl (the latest github version of wfl) pytests with the pip version of ASE (3.22.1), and it passed, and also with the latest gitlab master HEAD (3.23.0b1), and it also passed. I'm not sure why it's not working for you. Is it possible that the wfl
version on the remote machine isn't the latest?
I have checked again after updating both ASE and wfl to the latest version but I'm having the same error. When I change espresso.py
in remote cluster to ASE 3.22.1 it works, but with ASE 3.23.0b1 it fails.
I'm not sure what's going on, but I don't see any way for the remote behavior to be different from the local behavior if they're running the same versions of wfl and ase. I guess I'll test it explicitly here.
Can you find the directory where the submitted job ran and grab all the output and error files and upload them here? I'm hoping there's more info on where exactly it's having a problem.
I wonder if something is messed up with the PYTHONPATH for the remote job, and it's not loading the wfl version you intend it to.
These are the related files in the submitted job directory. I'm not quite sure what's the source of error. It seems correctly launching intended version of wfl.
Thanks. I might need to give you a version that can produce better error information. I'll investigate some things here first.
I just added a test that runs a remote Espresso job, and it runs fine (#294). I'll look a bit more, but I think there has to be some sort of version issue with you remote jobs. It's pretty easy for the remote job to end up with different paths, PYTHONPATH, etc. Can you describe your setup in more detail? Is it really a remote job, or is it just a queued job and the main workflow running on the login node of the HPC?
Can you post the workflow script (or, ideally, a simpler script that shows the same problem) here?
If you can install wfl
from the espresso_remote_job_test
branch (instead of main
) that version should provide us with better error information for the way your code is failing.
@stenczelt As a person who ran into this, where do you think it should be documented so it's most likely to be noticed?
A notice in the top level ReadMe is a good idea, I've actually looked at the documentation this time, so maybe a paragraph or one more code block in the Installation section would be useful: https://libatoms.github.io/workflow/#installation
@stenczelt please take a look at the changes in #294 . I'm not sure there's an easy way to see the formatted docs (the README you can see by switching to that branch), but you can look at the .rst
source file changes.
@jungsdao Have you had a chance to test the espresso_remote_job_test
branch? It should give more error information if you're still having this problem.
I have tried with espresso_remote_job_test
branch in remote cluster and it gives following error. (from _expyre_job_error
)
1 Exception: Failed to construct calculator, original attempt's exception was '(exc)
2 Traceback (most recent call last):
3 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 49, in _run_autopara_wrappable
4 calculator_default = construct_calculator_picklesafe(calculator)
5 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/utils/parallel.py", line 51, in construct_calculator_picklesafe
6 return calculator[0](*c_args, **c_kwargs)
7 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/espresso.py", line 88, in __init__
8 super().__init__(keep_files=keep_files, rundir_prefix=rundir_prefix,
9 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/wfl_fileio_calculator.py", line 48, in __init__
10 super().__init__(**kwargs)
11 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/espresso.py", line 216, in __init__
12 super().__init__(
13 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/genericfileio.py", line 336, in __init__
14 raise EnvironmentError(f'No configuration of {template.name}')
15 ase.calculators.calculator.EnvironmentError: No configuration of espresso
16 '
17 multiprocessing.pool.RemoteTraceback:
18 """
19 Traceback (most recent call last):
20 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/multiprocessing/pool.py", line 125, in worker
21 result = (True, func(*args, **kwds))
22 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/autoparallelize/pool.py", line 70, in _wrapped_autopara_wrappable
23 outputs = op(*u_args, **kwargs)
24 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 86, in _run_autopara_wrappable
25 raise ValueError(f"Failed to construct calculator, original attempt's exception was '{calculator_failure_message}'")
26 ValueError: Failed to construct calculator, original attempt's exception was '(exc)
27 Traceback (most recent call last):
28 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 49, in _run_autopara_wrappable
29 calculator_default = construct_calculator_picklesafe(calculator)
30 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/utils/parallel.py", line 51, in construct_calculator_picklesafe
31 return calculator[0](*c_args, **c_kwargs)
32 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/espresso.py", line 88, in __init__
33 super().__init__(keep_files=keep_files, rundir_prefix=rundir_prefix,
34 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/wfl_fileio_calculator.py", line 48, in __init__
35 super().__init__(**kwargs)
36 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/espresso.py", line 216, in __init__
37 super().__init__(
38 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/genericfileio.py", line 336, in __init__
39 raise EnvironmentError(f'No configuration of {template.name}')
40 ase.calculators.calculator.EnvironmentError: No configuration of espresso
41 '
42 """
43
44 The above exception was the direct cause of the following exception:
45
46 Traceback (most recent call last):
47 File "/raven/ptmp/hjung/GAP/scratch/unkownhost-_home_hjung/run_eval_dft_chunk_0_dfzhb4Sm89qkJVMNcICoHzKH9gGfe3KPYGE3Vecnk_8=_c5vhqbuu/_expyre_script_core.py", line 9, in <module>
48 results = function(*args, **kwargs)
49 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/autoparallelize/pool.py", line 157, in do_in_pool
50 for result_group in results:
51 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/multiprocessing/pool.py", line 870, in next
52 raise value
53 ValueError: Failed to construct calculator, original attempt's exception was '(exc)
54 Traceback (most recent call last):
55 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/generic.py", line 49, in _run_autopara_wrappable
56 calculator_default = construct_calculator_picklesafe(calculator) 57 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/utils/parallel.py", line 51, in construct_calculator_picklesafe
58 return calculator[0](*c_args, **c_kwargs)
59 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/espresso.py", line 88, in __init__
60 super().__init__(keep_files=keep_files, rundir_prefix=rundir_prefix,
61 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/wfl/calculators/wfl_fileio_calculator.py", line 48, in __init__
62 super().__init__(**kwargs)
63 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/espresso.py", line 216, in __init__
64 super().__init__(
65 File "/u/hjung/conda-envs/wfl_test/lib/python3.9/site-packages/ase/calculators/genericfileio.py", line 336, in __init__
66 raise EnvironmentError(f'No configuration of {template.name}')
67 ase.calculators.calculator.EnvironmentError: No configuration of espresso
68 '
How are you passing the pw.x command to the calculator constructor?
And can you confirm that you can manually create an Espresso
calculator (outside of wfl) using the arguments (positional or kwargs) you're passing the calculator constructor you're trying to use in wfl
?
[edited] the ASE Espresso
calculator switched from a command
keyword arg to an EspressoProfile
, which the wrapper reconstructs from the calc_exec
argument. It's possible that if you're passing a command
but the wrapper is detecting that you have a version that supports the profile, it's not handling that combination well]
@jungsdao If you can answer the questions in my previous post, we can hopefully fix this. I suspect a conflict between the different ways of passing the executable to Espresso.
I used to pass pw.x command via environ variable in slurm submission script.
export ASE_ESPRESSO_COMMAND='srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x -in PREFIX.pwi > PREFIX.pwo'
When I tried to execute ASE espresso outside of wfl, I got following error complaining profile
11 Traceback (most recent call last):
12 File "/raven/u/hjung/test/test.py", line 57, in <module>
13 calc = Espresso(command=command, input_data=input_data, kpts=(4, 4, 1), pseudopotentials=psp)
14 File "/u/hjung/conda-envs/mace_env/lib/python3.9/site-packages/ase/calculators/espresso.py", line 201, in __in it__
15 raise RuntimeError(compatibility_msg)
16 RuntimeError: Espresso calculator is being restructured. Please use e.g. Espresso(profile=EspressoProfile(argv= ['mpiexec', 'pw.x'])) to customize command-line arguments.
Like you have explained it should definitely have to do with new profile argument required by new ASE espresso.
OK. You should be able to get it to work by passing a new argument to the wfl.calculators.Espresso
wrapper calc_exec = "srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x"
(without the PREFIX stuff).
I'll also think about how to get it to work best with both the old and new syntax, if possible, but I think passing a command via the env var is more or less deprecated.
Just confirmed that adding calculator_exec" : "srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x"
to QE kwargs do not cause the previous error.
Just confirmed that adding
calculator_exec" : "srun /u/hjung/Softwares/QE/qe-7.0/bin/pw.x"
to QE kwargs do not cause the previous error.
OK - I'll see what I can do to make things internally consistent, and then merge the PR
I think I have a solution that will at least give clearer error messages. I'll merge as soon as I push and tests pass.
closed by #294
I think following part of
generate/optimize.py
requires the latest version of ASE '3.23.0b1'6 from ase.filters import FrechetCellFilter
But wfl seems to conflict with
espresso.py
in ASE '3.23.0b1' showing following error. Because of this, I had to downgrade onlyespresso.py
to make it work. (copied from ASE 3.22.1) I'm not totally sure this is related with ASE version though but downgrading didn't cause the error.