pyiron / pyiron_atomistics

pyiron_atomistics - an integrated development environment (IDE) for atomistic simulation in computational materials science.
https://pyiron-atomistics.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 15 forks source link

Wrongly stored settings and missing warning when continuing job on remote cluster #962

Open Leimeroth opened 1 year ago

Leimeroth commented 1 year ago

Summary

When trying to continue a lammps job with the continue_with_restart_files function with manually changed queue settings they were not correctly overwritten. In my case the queue itself was updated, but the number of cores was the max value of the old queue. The code looked something like this:

    cjob = job.continue_with_restart_files(job_name=name)
    cjob.structure = job.get_structure()
    cjob.write_restart_file()
    cjob.input.control.remove_keys(["reset_timestep"])
    cjob.input.control["run"] = f"{steps} start 0"
    cjob.server.cores = SomeInt
    cjob.server.queue = "other_queue"
    cjob.server.run_time = RunTime

The issue seems to be related to the order in which server.cores and server.queue are set. When manually calling cjob.to_hdf() locally a warning is put out that cores is set back to 8. However, when submitting the job to a remote cluster the warning was not printed, i.e. this just silently failed to update to the given value.

Expected Behavior

Values that are updated manually are actually updated or the warning is printed also when submitting to a remote cluster.

Actual Behavior

Number of cores in the submit script does not correspond to the old values and not the value I set. Instead the cores_max value from the old queue is taken.

pmrv commented 1 year ago

Sorry for dropping this for so long. According to the code setting queue after cores should issue a warning if it changes the core setting. It should be issued when you run the example code from above.