pyiron / pyiron_base

Core components of the pyiron integrated development environment (IDE) for computational materials science
https://pyiron-base.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 13 forks source link

Can we get rid of `job_name`? #966

Open samwaseda opened 1 year ago

samwaseda commented 1 year ago

Look at this example:

def get_job(structure, pressure=0, minimize=True, pr=pr):
    lmp = pr.create.job.Lammps(('job_name', minimize))
    lmp.structure = structure
    if minimize:
        lmp.calc_minimize(pressure=pressure)
    lmp.run()
    return lmp

This is a very typical example, in which I keep making the same mistake - I change the pressure value and the job doesn't run, because I forgot to include it in the job name. I'm sure I'm not the only one having this problem.

Based on the development in this PR, I would love to be able to set the job name afterwards, i.e.:

def get_job(structure, pressure=0, minimize=True, pr=pr):
    lmp = pr.create.job.Lammps()
    lmp.structure = structure
    if minimize:
        lmp.calc_minimize(pressure=pressure)
    lmp.job_name = 'lmp_' + lmp.input.get_hash()
    lmp.run()
    return lmp

In this case I'm sure that as soon as I change job.input.control it would be reflected in the job name. In this case it should complain when the job name is not set when job.run is called.

My final vision would be to get rid of the mandatory setting of job_name altogether, i.e. it should be generated automatically from the input. NB: I'm not saying we should get rid of it - I'm saying I'd like to make it optional. Do you guys see a inherent problem with this?

P.S.

With the current implementation of get_hash, what would be possible is:

from pyiron_base.storage.hash import PseudoHDF, get_hash

hdf = PseudoHDF()
lmp.input.to_hdf(hdf)
lmp.job_name = 'lmp_' + get_hash(hdf)
appassionate commented 1 year ago

Hi, I think it is an very exciting idea! Like custodian in Fireworks or CalcJob in aiida, all these calculation description can be queried by some hash-id(maybe im wrong). To satisfy researchers rather than manufactoring , Pyiron is more outstanding in job interactive definition i think..., so every job will be distinguished by some path /some/path/calc_hdf5, cause path is hashed, it help users to locate the calculation, but not good for large-scale workflows, i guess?

I'm just imagining if pyiron calculations can use the hash id to locate the job as some_project_path/(hash_id_hdf5)/ by using a robust database backend, which can also quickly search a speciall job by job_name. in other words, job_name might be just a tag or description.

best wishes.