pyiron / pyiron_developer_tutorial

This repository is made for the tutorial session on July 26-27.
2 stars 2 forks source link

Idea: easily accessible workflows - pr.create.workflows() #1

Open jan-janssen opened 2 years ago

jan-janssen commented 2 years ago

It would be nice to have certain functionality available as single functions. While these are less flexible compared to the GenericMaster classes, still for users who mainly want to calculate specific properties would benefit from these simple functions. Furthermore, we could enhance the GenericMaster class to accept a function as an input, use the function arguments as entries in the input DataContainer and store the output dictionary in the output Datacontainer.

So the suggestion for the developer tutorials is to develop functions like this:

def calc_energy_volume_curve(ref_job, vol_range=0.1, num_points=11, fit_type="polynomial", fit_order=3):
    # Based on https://github.com/pyiron/pyiron_atomistics/blob/master/pyiron_atomistics/atomistics/master/murnaghan.py#L594

    # Add code here
    murn = ref_job.create.job.Murnaghan(job_name=ref_job.job_name + "_ev")
    murn.input["vol_range"] = vol_range
    murn.input["num_points"] = num_points
    murn.input["fit_type"] = fit_type
    murn.input["fit_order"] = fit_order
    murn.run()

    return murn.fit_dict

Possible examples to be converted into workflows would be:

samwaseda commented 2 years ago

My suggestion goes probably beyond this hackathon session, but for pyiron to become a truly useful tool for everyone, it should become able to answer more scientific questions, like "what's the lattice constant of Fe?" or "what's the melting temperature of Al?". And in this regard, I think pr.create.workflow is the right direction, but I might want to have something like:

workflow = pr.create.workflow.EnergyVolumeCurve('lattice_property')
workflow.input['structure'] = 'Al'
workflow.input['lattice_constant'] = True
workflow.input['bulk_modulus'] = True
workflow.run()

Here, the advantage is that there's absolutely no technical detail (except for the fact that the user probably still has to know that those values can be calculated via EnergyVolumeCurve). This would make it extremely attractive for people who are not really good at programming, like experimentalists for example.

niklassiemer commented 2 years ago

This touches on some ideas of automated workflow generation to obtain certain properties we are discussing in the ontology hackathon. So, this might become even more useful at some point :)

liamhuber commented 2 years ago

I like this direction. In my mind it goes in the direction of separating out workflows and models (for atomistics at least). I like @samwaseda's workflow object, and @jan-janssen's list of routines. I would actually think that even minimization and md are good workflows!

I would think something like this...

workflow = pr.atomistics.workflow.Minimize('my_min')  # or pr.create...
workflow.input.structure = some_struct
workflow.input.n_print = 1
workflow.input.set_engine.lammps()  # vasp(), etc also show up in tab completion

and

workflow = pr.atomistics.workflow.ElectronicDensityOfStates('my_dos') 
workflow.input.structure = some_struct
workflow.input.smearing.width = 0.1
workflow.input.smearing.set_style.fermi()
workflow.input.set_engine.vasp()  # here *only* electronic structure codes show up in tab completion

I am also open to leaving as many settings as possible on the raw job and have something like

workflow = pr.atomistics.workflow.MD('my_md') 
workflow.input.n_steps = 100
workflow.input.temperature = 300
workflow.input.job = some_job_where_I_already_set_all_the_usual_job_input
samwaseda commented 2 years ago

So now I did something super violent

import numpy as np
from pyiron_atomistics import Project
import matplotlib.pylab as plt

import inspect
class WorkFlow:

    def __init__(self, project: Project, name: str):
        self._project = project.create_group(name)

    def __getattr__(self, attr):
        if attr.startswith('get_') or attr.startswith('_'):
            raise AttributeError("'{}' has no attribute '{}'".format(self.__class__.__name__, attr))
        try:
            return getattr(self, '_' + attr)
        except AttributeError:
            args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
            setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
            return getattr(self, '_' + attr)

And the user can write something like (Edit: I changed the job names to include Liam's point below):

class ElasticConstants(WorkFlow):
    element = 'Fe'

    def get_lattice_constant(self, element):
        job = self.project.create.job.Lammps(('lmp_murn', element))
        job.structure = self.project.create.structure.bulk(element, cubic=True)
        job.interactive_open()
        murn = job.create_job('Murnaghan', job.job_name.replace('lmp_', ''))
        murn.run()
        return murn['output/equilibrium_volume']**(1 / 3)

    def get_elastic_constants(self, element, lattice_constant):
        job = self.project.create.job.Lammps(('lmp_elast', element, lattice_constant))
        job.structure = pr.create.structure.bulk(element, a=lattice_constant, cubic=True)
        job.interactive_open()
        elast = job.create_job('ElasticTensor', job.job_name.replace('lmp_', ''))
        elast.run()
        return elast['output/elastic_tensor']

pr = Project('my_project')

wf = ElasticConstants(pr, 'elastic_tensor')

plt.imshow(wf.elastic_constants)

A bit of explanation: self.elastic_constants is not explicitly defined in the class, so WorkFlow looks for it from get_elastic_constants, and then it looks does the same thing for element and lattice_constant. I know that people won't like its implicitness (I don't, either), but I like it's simplicity and the fact that there's nothing redundant in the code. By the way since I know that it's not going to be the final format for our WorkFlow class, I didn't even bother about input and output storage. Anyway, I would love to have something this simple.

liamhuber commented 2 years ago

Well, you're right that I don't like the implicitness 😝 I'm also not so convinced that it's simple? It's short, but I suspect that's also because it misses some kind of important functionality. For instance, what happens if I run the workflow, then call wf.element='Al' and rerun it? I'm pretty sure that I just silently get back the original Fe results. In fact, the workflow doesn't offer any way to rerun the job. For that we need extra input variables, which then need to be exploited in each of the functions (do we need the choice to rerun jobs in different functions individually?)... very quickly I worry this goes from being complex but short to being complex and long.

This is a rich enough topic that we might really benefit from listening to Joel and taking some time to write spec.

For instance, from the perspective of a user I imagine a workflow being an object that has input, and after all the necessary input is specified I can call the output. E.g.

wf = pr.create.workflow.ElasticConstants('my_constants')
wf.input.element = 'Fe'
print(wf.output.lattice_constants)
plt.imshow(wf.output.elastic_constants)

Now, what happens if I change the input on an existing workflow? Should it complain and say this is not allowed? If it is allowed, the next time I query the output, it had better update itself! From an implementation side we then probably want a system that lets us re-run the minimal amount of calculations to update this output, which probably also means having a system to track what input was used to generate the current output and compare current input against that.

What about the developer perspective? Code can only be written once, but can be read many, many times. I agree that your example above is fast to write, but it is hell to read. In this really simple example it is not too hard to see that asking for elastic_constants will under the hood run lattice_constant but...

I think we should lay out some spec guidelines first, for both user and dev perspectives. Summarizing what's above....

For the user I would suggest:

For the dev I would suggest:

This is definitely not complete...

samwaseda commented 2 years ago

what happens if I run the workflow, then call wf.element='Al' and rerun it?

I would simply add element in the job name. Essentially every job inside any function should contain all the function arguments inside the job name. Anyway I don't see how we could possibly include it in the current pyiron format.

I agree that your example above is fast to write, but it is hell to read.

Whaaaaaaat the reason why I wrote it was because I found it super difficult to read your code

I think we should lay out some spec guidelines first, for both user and dev perspectives. Summarizing what's above....

I guess we are going apart at this point, because I don't think that a workflow would be in the first place used by other people very often, because each workflow is probably so special, so I wouldn't necessarily distinguish between "user" and "developer". At least if writing the workflow is a lot more work than what @jan-janssen wrote in the first post or what I wrote, then it will have just a beautiful framework with little practical interest.

And actually from my point of view, all points that you named for the users are actually either fulfilled or can be fulfilled with my implementation. So except for the fact that it's no explicit enough I don't really see a problem.

liamhuber commented 2 years ago

Essentially every job inside any function should contain all the function arguments inside the job name.

I have been down this road. I wound up having to hash the input in order to keep the job name within allowable length limits. For routines with more complex input this really falls apart 😞

Whaaaaaaat the reason why I wrote it was because I found it super difficult to read your code

😆 uh oh! What I mean is that to understand your version I need to read all of the functions in their entirety while translating names implicitly (removing the 'get' part, etc). Because everything is implicit, there is nowhere I can easily see the relationship between the different sub-functions, I have to actually look at their content. Of course otherwise your code is well formatted with elegant variable names 😸

(aside: don't we need job.interactive_close() or with _ as job.interactive_open():??

I guess we are going apart at this point, because I don't think that a workflow would be in the first place used by other people very often, because each workflow is probably so special, ...

Ah, this is opposite the impression I got from Jan's original post, e.g. where there are massively broad and useful examples like calc_gamma_surface!

...so I wouldn't necessarily distinguish between "user" and "developer". At least if writing the workflow is a lot more work than what @jan-janssen wrote in the first post or what I wrote, then it will have just a beautiful framework with little practical interest.

I agree writing them needs to be rather straightforward, although IMO using OOP doesn't cross the threshold into making it not-straightforward. Regarding practical interest, look how much interest there is in job.calc_md(), which may or may not have an elegant back-end but has loads of users from the front-end. I definitely can see workflows being similarly adopted by users (provided the front-end is nice) regardless of how nice (or not!) the back-end is.

And actually from my point of view, all points that you named for the users are actually either fulfilled or can be fulfilled with my implementation. So except for the fact that it's no explicit enough I don't really see a problem.

Agreed! From the user perspective there is nothing that pops out to me as incompatible with your implementation -- although certain aspects may be easier or harder between class node vs function node implementations.

Then it comes back to 'going apart', and this is why I think it's clutch to nail down some spec -- without writing any code we already found out that you disagree with an entire subsection of my design spec 😂 Since you and I have each written two essays in the last 24 hrs I guess we should let it sit for a bit to give the rest of those interested a chance to share wisdom. It is just a very fun topic to talk about 😄