Idea: easily accessible workflows - pr.create.workflows()

It would be nice to have certain functionality available as single functions. While these are less flexible compared to the GenericMaster classes, still for users who mainly want to calculate specific properties would benefit from these simple functions. Furthermore, we could enhance the GenericMaster class to accept a function as an input, use the function arguments as entries in the input DataContainer and store the output dictionary in the output Datacontainer.

So the suggestion for the developer tutorials is to develop functions like this:

def calc_energy_volume_curve(ref_job, vol_range=0.1, num_points=11, fit_type="polynomial", fit_order=3):
    # Based on https://github.com/pyiron/pyiron_atomistics/blob/master/pyiron_atomistics/atomistics/master/murnaghan.py#L594

    # Add code here
    murn = ref_job.create.job.Murnaghan(job_name=ref_job.job_name + "_ev")
    murn.input["vol_range"] = vol_range
    murn.input["num_points"] = num_points
    murn.input["fit_type"] = fit_type
    murn.input["fit_order"] = fit_order
    murn.run()

    return murn.fit_dict

Possible examples to be converted into workflows would be:

calc_energy_volume_curve() - https://github.com/pyiron/pyiron_atomistics/blob/master/notebooks/energy_volume_curve.ipynb
calc_free_energy_using_debye() - https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/7-thermodynamics/7-1-free-energy/Murnaghan%20and%20Debye%20Al%20(Debye%2C%20MD%2C%20Phonopy).ipynb
calc_free_energy_using_quasi_harmonic_approximation() - https://github.com/pyiron/phasediagram-workshop-2020/blob/master/Exercise4.ipynb
calc_bulk_phasediagram() - https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/7-thermodynamics/7-2-bulk-phases/Bulk%2Bphase%2Bdiagram.ipynb
calc_gamma_surface() - https://github.com/pyiron/pyiron_atomistics/blob/master/notebooks/gamma_surface_parallel_master.ipynb
calc_elastic_constants() - https://github.com/eisenforschung/pyiron-meeting-mpie/tree/master/handbook/2-lammps/2-4-elastic-constants
calc_stress_strain_curve() - https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/2-lammps/2-5-stress-strain/deform-demo.ipynb
calc_vacancy_formation_energy() - https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/2-lammps/2-7-vacancy/
calc_interface_energy() -https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/2-lammps/2-8-interface/interface_energy_to_Jan_25_09_2017.ipynb
calc_dft_density_of_states() - https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/3-vasp/3-3-electronic-structure/MgO_dos.ipynb
calc_dft_workfunctions() - https://github.com/eisenforschung/pyiron-meeting-mpie/blob/master/handbook/3-vasp/3-3-electronic-structure/workfunction_Mg.ipynb

My suggestion goes probably beyond this hackathon session, but for pyiron to become a truly useful tool for everyone, it should become able to answer more scientific questions, like "what's the lattice constant of Fe?" or "what's the melting temperature of Al?". And in this regard, I think pr.create.workflow is the right direction, but I might want to have something like:

workflow = pr.create.workflow.EnergyVolumeCurve('lattice_property')
workflow.input['structure'] = 'Al'
workflow.input['lattice_constant'] = True
workflow.input['bulk_modulus'] = True
workflow.run()

Here, the advantage is that there's absolutely no technical detail (except for the fact that the user probably still has to know that those values can be calculated via EnergyVolumeCurve). This would make it extremely attractive for people who are not really good at programming, like experimentalists for example.

This touches on some ideas of automated workflow generation to obtain certain properties we are discussing in the ontology hackathon. So, this might become even more useful at some point :)

I like this direction. In my mind it goes in the direction of separating out workflows and models (for atomistics at least). I like @samwaseda's workflow object, and @jan-janssen's list of routines. I would actually think that even minimization and md are good workflows!

I would think something like this...

workflow = pr.atomistics.workflow.Minimize('my_min')  # or pr.create...
workflow.input.structure = some_struct
workflow.input.n_print = 1
workflow.input.set_engine.lammps()  # vasp(), etc also show up in tab completion

and

workflow = pr.atomistics.workflow.ElectronicDensityOfStates('my_dos') 
workflow.input.structure = some_struct
workflow.input.smearing.width = 0.1
workflow.input.smearing.set_style.fermi()
workflow.input.set_engine.vasp()  # here *only* electronic structure codes show up in tab completion

I am also open to leaving as many settings as possible on the raw job and have something like

workflow = pr.atomistics.workflow.MD('my_md') 
workflow.input.n_steps = 100
workflow.input.temperature = 300
workflow.input.job = some_job_where_I_already_set_all_the_usual_job_input

So now I did something super violent

import numpy as np
from pyiron_atomistics import Project
import matplotlib.pylab as plt

import inspect
class WorkFlow:

    def __init__(self, project: Project, name: str):
        self._project = project.create_group(name)

    def __getattr__(self, attr):
        if attr.startswith('get_') or attr.startswith('_'):
            raise AttributeError("'{}' has no attribute '{}'".format(self.__class__.__name__, attr))
        try:
            return getattr(self, '_' + attr)
        except AttributeError:
            args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
            setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
            return getattr(self, '_' + attr)

And the user can write something like (Edit: I changed the job names to include Liam's point below):

class ElasticConstants(WorkFlow):
    element = 'Fe'

    def get_lattice_constant(self, element):
        job = self.project.create.job.Lammps(('lmp_murn', element))
        job.structure = self.project.create.structure.bulk(element, cubic=True)
        job.interactive_open()
        murn = job.create_job('Murnaghan', job.job_name.replace('lmp_', ''))
        murn.run()
        return murn['output/equilibrium_volume']**(1 / 3)

    def get_elastic_constants(self, element, lattice_constant):
        job = self.project.create.job.Lammps(('lmp_elast', element, lattice_constant))
        job.structure = pr.create.structure.bulk(element, a=lattice_constant, cubic=True)
        job.interactive_open()
        elast = job.create_job('ElasticTensor', job.job_name.replace('lmp_', ''))
        elast.run()
        return elast['output/elastic_tensor']

pr = Project('my_project')

wf = ElasticConstants(pr, 'elastic_tensor')

plt.imshow(wf.elastic_constants)

A bit of explanation: self.elastic_constants is not explicitly defined in the class, so WorkFlow looks for it from get_elastic_constants, and then it looks does the same thing for element and lattice_constant. I know that people won't like its implicitness (I don't, either), but I like it's simplicity and the fact that there's nothing redundant in the code. By the way since I know that it's not going to be the final format for our WorkFlow class, I didn't even bother about input and output storage. Anyway, I would love to have something this simple.

Well, you're right that I don't like the implicitness 😝 I'm also not so convinced that it's simple? It's short, but I suspect that's also because it misses some kind of important functionality. For instance, what happens if I run the workflow, then call wf.element='Al' and rerun it? I'm pretty sure that I just silently get back the original Fe results. In fact, the workflow doesn't offer any way to rerun the job. For that we need extra input variables, which then need to be exploited in each of the functions (do we need the choice to rerun jobs in different functions individually?)... very quickly I worry this goes from being complex but short to being complex and long.

This is a rich enough topic that we might really benefit from listening to Joel and taking some time to write spec.

For instance, from the perspective of a user I imagine a workflow being an object that has input, and after all the necessary input is specified I can call the output. E.g.

wf = pr.create.workflow.ElasticConstants('my_constants')
wf.input.element = 'Fe'
print(wf.output.lattice_constants)
plt.imshow(wf.output.elastic_constants)

Now, what happens if I change the input on an existing workflow? Should it complain and say this is not allowed? If it is allowed, the next time I query the output, it had better update itself! From an implementation side we then probably want a system that lets us re-run the minimal amount of calculations to update this output, which probably also means having a system to track what input was used to generate the current output and compare current input against that.

What about the developer perspective? Code can only be written once, but can be read many, many times. I agree that your example above is fast to write, but it is hell to read. In this really simple example it is not too hard to see that asking for elastic_constants will under the hood run lattice_constant but...

This assumes that I'm familiar and comfortable with the implicit setup inherited from the parent class
What about when we have a work flow with (many??) more functions? Then I really need to read all over the place -- at least looking at the signature of each function and remembering to mentally remap names according to the implicit scheme -- before I can understand how things are wired together
Because the naming is implicit, if I want to refactor one of my variable names, there is no object relationship and I'm really stuck looking at all the partial string matches.

I think we should lay out some spec guidelines first, for both user and dev perspectives. Summarizing what's above....

For the user I would suggest:

The user instantiates the workflow with a name
The user sets input of the workflow
The user queries output of the workflow
- If some input is missing, we get an error here
- If some underlying job has a 'running' status we get a 'please be patient...' message here
- If the user changes input, the output is updated when they query it to reflect this
- Implementation: We probably want to redo the minimal number of calculations to accomplish this, which probably means storing the last-used input and making a comparison
- Implementation: Wherever we have input and output serialized, we need to make sure not to have new input serialized with old output

For the dev I would suggest:

The flow of a workflow should be contained in an easy to read place
- Implementation: Ideally it should be logically necessary to have it in one place, in contrast to relying on the user to write a nice docstring explaining how it works
Implementation should be IDE-friendly, so that as much as possible variables have a clear structural relationship and refactoring tools can be exploited.

This is definitely not complete...

what happens if I run the workflow, then call wf.element='Al' and rerun it?

I would simply add element in the job name. Essentially every job inside any function should contain all the function arguments inside the job name. Anyway I don't see how we could possibly include it in the current pyiron format.

I agree that your example above is fast to write, but it is hell to read.

Whaaaaaaat the reason why I wrote it was because I found it super difficult to read your code

I think we should lay out some spec guidelines first, for both user and dev perspectives. Summarizing what's above....

I guess we are going apart at this point, because I don't think that a workflow would be in the first place used by other people very often, because each workflow is probably so special, so I wouldn't necessarily distinguish between "user" and "developer". At least if writing the workflow is a lot more work than what @jan-janssen wrote in the first post or what I wrote, then it will have just a beautiful framework with little practical interest.

And actually from my point of view, all points that you named for the users are actually either fulfilled or can be fulfilled with my implementation. So except for the fact that it's no explicit enough I don't really see a problem.

Essentially every job inside any function should contain all the function arguments inside the job name.

I have been down this road. I wound up having to hash the input in order to keep the job name within allowable length limits. For routines with more complex input this really falls apart 😞

Whaaaaaaat the reason why I wrote it was because I found it super difficult to read your code

😆 uh oh! What I mean is that to understand your version I need to read all of the functions in their entirety while translating names implicitly (removing the 'get' part, etc). Because everything is implicit, there is nowhere I can easily see the relationship between the different sub-functions, I have to actually look at their content. Of course otherwise your code is well formatted with elegant variable names 😸

(aside: don't we need job.interactive_close() or with _ as job.interactive_open():??

I guess we are going apart at this point, because I don't think that a workflow would be in the first place used by other people very often, because each workflow is probably so special, ...

Ah, this is opposite the impression I got from Jan's original post, e.g. where there are massively broad and useful examples like calc_gamma_surface!

...so I wouldn't necessarily distinguish between "user" and "developer". At least if writing the workflow is a lot more work than what @jan-janssen wrote in the first post or what I wrote, then it will have just a beautiful framework with little practical interest.

I agree writing them needs to be rather straightforward, although IMO using OOP doesn't cross the threshold into making it not-straightforward. Regarding practical interest, look how much interest there is in job.calc_md(), which may or may not have an elegant back-end but has loads of users from the front-end. I definitely can see workflows being similarly adopted by users (provided the front-end is nice) regardless of how nice (or not!) the back-end is.

And actually from my point of view, all points that you named for the users are actually either fulfilled or can be fulfilled with my implementation. So except for the fact that it's no explicit enough I don't really see a problem.

Agreed! From the user perspective there is nothing that pops out to me as incompatible with your implementation -- although certain aspects may be easier or harder between class node vs function node implementations.

Then it comes back to 'going apart', and this is why I think it's clutch to nail down some spec -- without writing any code we already found out that you disagree with an entire subsection of my design spec 😂 Since you and I have each written two essays in the last 24 hrs I guess we should let it sit for a bit to give the rest of those interested a chance to share wisdom. It is just a very fun topic to talk about 😄

pyiron / pyiron_developer_tutorial

Idea: easily accessible workflows - pr.create.workflows() #1