Open robinsonmt1 opened 1 week ago
mtd.txt (Py script) py-submission.txt (Slurm script) e58363490.txt (Standard error file) aims_err.txt (Aims error file)
Okay, an update from today's group hack session and subsequent tests myself. We identified a number of different issues which together caused various different crashes, so I went about testing them one by one and here are my findings:
Therefore, the changes that I think must be made are: python version changed to 3.9.2, aims_calculator to change default compute_forces argument to the boolean True (or specified in input scripts), and updates to genericfileio.py in ASE to include shell=True and join the argv_command into a string.
When running ASE 3.23 on CPUs, the $SLURM_NNODES variable isn't resolved properly and the calculation crashes when it tries to run the srun command. The same calculation worked on GPUs. Along with the ASE version, the python environment also had to be changed to 3.11 to match. Whilst PLUMED was originally used when this error came up, the problem also remains even with a simple atoms.get_total_energy() command.
Some possible suggestions for the source of the problem: