pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
10 stars 1 forks source link

Include node containers where the actual node is chosen in the input #9

Open liamhuber opened 9 months ago

liamhuber commented 9 months ago

From @JNmpi in pyiron/pyiron_contrib#855

Include node containers where the actual node is chosen in the input

wf = Workflow('simple_lammps_calculation')

structure = wf.create.atomistics.Bulk(cubic=True, name="Cu", label='structure')

lammps_job = wf.create.lammps.LammpsStaticNode(label='lammps_job')
lammps_job.inputs.structure = structure
lammps_job.inputs.potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'  # potential does not contain Cu

lammps_job.inputs.calc.md(temperature=1000)
# or
lammps_job.inputs.calc.static()
potential starts

Issue: In the present implementation I did not find a way to implement a set of calculator nodes such as calc_md, calc_static etc. Having selectable node containers would be very helpful also for other applications.

@JNmpi, I'm not 100% clear on what your definition of "container" is, but at least from the listed example I am uncomfortable with moving in this direction. What it looks like here is that lammps_job is a macro with some sort of branched internal structure where, depending on the input, the internal execution path is updated to use (or leave unused) different forks of the workflow. I don't like the fact that this paradigm lugs around all that extra graph information -- this could slow down topology analysis, definitely the visual representation, and complicates things if we try to control the graph from a GUI.

I'm also not a fan of the particular implementation of this idea that relies on extending channels to....

I would suggest an alternative paradigm that looks like this:

wf = Workflow('simple_lammps_calculation')

structure = wf.create.atomistics.Bulk(cubic=True, name="Cu", label='structure')

lammps_job = wf.create.lammps.LammpsStaticNode(label='lammps_job')
lammps_job.inputs.structure = structure
lammps_job.inputs.potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'  # potential does not contain Cu

lammps_md = wf.create.atomistics.md(temperature=1000, engine=lammps_job)
lammps_static = wf.create.atomistics.static(engine=lammps_job)

My rough idea is that the static and md nodes might be aware of particular codes and have special instructions to do them in a computationally-efficient way, or otherwise fall back on some manual python execution -- all provided that the engine node passed conforms to some specific interface (e.g. taking a structure: Atom and returning energy: float and forces: np.ndarray[float]). Something I like about this is then you can imagine engine-agnostic routines like atomistics.neb or atomistics.thermodynamic_integration.

If this winds up being unfeasible for technical reasons, and we really need to keep the calculations living in the same space as the engine definitions, I would still prefer something like this:

wf = Workflow('simple_lammps_calculation')

structure = wf.create.atomistics.Bulk(cubic=True, name="Cu", label='structure')

###
lammps_job = wf.create.lammps.LammpsStaticNode(label='lammps_job')
# Or
lammps_job = wf.create.lammps.LammpsMdNode(label='lammps_job', temperature=1000)
###

lammps_job.inputs.structure = structure
lammps_job.inputs.potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'  # potential does not contain Cu

Where LammpsStaticNode and LammpsMdNode can both inherit from the same abstract macro/metanode parent and just swap out one of their internal nodes to get the right calc. This is basically what you've already done over in pyiron/pyiron_contrib#856 except with code duplication instead of inheritance, so it's a very minimal change.

Actually, looking again at pyiron/pyiron_contrib#856 we could even just change what's exposed in the job macro so it looks like the following:

wf = Workflow('simple_lammps_calculation')

structure = wf.create.atomistics.Bulk(cubic=True, name="Cu", label='structure')

lammps_job = wf.create.lammps.Lammps(label='lammps_job')
lammps_job.inputs.structure = structure
lammps_job.inputs.potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'  # potential does not contain Cu

###
md_calculator = wf.create.lammps.CalcMd(temperature=1000, n_ionic_steps=10000)
lammps_job.inputs.calculator = md_calculator
# Or
lammps_job.inputs.calculator = wf.create.lammps.CalcStatic(label='calc_static')
###

I'm worried I may be missing your intention here, because for me these last two solutions really don't feel inferior to your example, and both are extremely close to the existing tools in pyiron/pyiron_contrib#856.

On transferring existing users

At the meeting on Monday we discussed the idea of maintaining a backwards-compatible syntax for existing pyiron users. From this perspective, I could imagine extending the Lammps node with two(+) new methods, calc_md and calc_static such that (some part of the) backwards compatibility looks like this:

wf = Workflow('simple_lammps_calculation')

structure = wf.create.atomistics.Bulk(cubic=True, name="Cu", label='structure')

lammps_job = wf.create.lammps.Lammps(label='lammps_job')
lammps_job.inputs.structure = structure
lammps_job.inputs.potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'  # potential does not contain Cu

###
lammps_job.calc_md(temperature=1000)
# Or
lammps_job.calc_static()
###

Where behind the scenes these methods are really creating and connected the underlying CalcX nodes, and giving them the same workflow parent as the lammps node (if any). The lammps node would also need to store a reference to these nodes so that if you call a calc_* method again it first deletes these quietly-created nodes to make room for the new ones. These would be deprecated from the moment they came into existence, and there might be some technical hiccups along the way (e.g. dynamically exposing lammps_job.inputs.temperature when really under the hood there is just some separate calc_md.inputs.temperature channel we need to modify), but it is at least possible in principle to go this route.

liamhuber commented 9 months ago

In the meeting today we discussed this issue, and we are content with the last solution (empowering macro nodes with methods which are able to change their internal structure and IO, e.g. lammps_job.calc_md(temperature=1000).

We also discussed the philosophical difference: the current pyiron setup (and indeed most of the materials science computational infrastructure) first chooses a code (Lammps, Vasp, etc.) then chooses physics (minimization, MD, etc.). It doesn't have to be this way, of course we could choose the physics first and then choose our level of approximation/code! (AFAIK Julia's materials science infrastructure works this way).

What's nice about the current pyiron_workflow setup is that the underlying workflow code is agnostic to our choice here; this is all done at the level of node libraries. So, we can build a set of node libraries where the user first defines a calculator then attaches and engine to it, and we can also build a set of nodes where the user defines their code and then says what to do with it. E.g. something like CalcMD(Function)+ LammpsEngine(Function) vs LammpsCalculation(Macro), where the latter has methods to modify its internal structure.

I wouldn't know for sure without first trying it, but I'm fairly confident that Macro already supports everything we need here; after building the children and IO, the calc_md and calc_static nodes would just need to go in, delete existing node(s), add new one(s), and wire everything up again.

I'll leave this open for a bit for visibility, but IMO this issue is closed now because we came to the philosophical agreement that we can support the existing pyiron syntax sufficient well by adding methods to macros.