rethink some atomate organization

computron commented 6 years ago

It's hard for newcomers to figure out how to do certain things in atomate.

Examples:

the various "HSE" band structure options are not clear at all - some of them seem like they are going to do a full HSE band structure (everything in HSE) when really they are re-using PBE for most things.
the preset vs base vs YAML workflow specifications are not so clear, as well as the various config options
One can't set the input set in the YAML spec (seems mainly due to the fact that pymatgen input sets require knowledge of structure to serialize properly) which makes the YAML specs incomplete. e.g. the HSE band structure can't specify to use the HSE input set in it
the code can feel a bit like a labyrinth, especially without a guide on how to use it

So - some of the code organization and documentation likely requires a re-think.

shyuep commented 6 years ago

Well, there are ways to allow specification of input sets to the YAML spec, e.g., by passing a string and then selecting the input set accordingly. That said, I think we need to fundamentally decide what the aim of atomate is. It is basically a recipe booklet for HT workflows. There should be some flexibility allowed, but it should not be to the level that you allow all sorts of modifications to the recipe. Specifying a completely different input set is pretty major. I can understand if you want a simple selector, e.g., GGA vs HSE version of a relax or bandstructure set, in which case the simple string method would work. But it shouldn't allow someone to supply a MPNEBSet for a bandstructure calculation. That would defeat the purpose of being a recipe.

That's the reason why I designed the YAML spec to work at the level of fireworks. Allowable modifications should be specified at the FireWorks level. We can certainly go down to the FireTasks level, but that would make the yaml very detailed and unwieldly. I subscribe to Tim Berners-Lee's Principle of Least Power.

computron commented 6 years ago

The purpose of putting the input set in the YAML spec would actually be to restrict freedom in deciding, not to expand it or allow more freedom to the user.

Right now, none of the YAML specifies an input set just because it's not easy to do so. A pymatgen input set can't be serialized independent of the structure you will put into it, meaning you need a structure to serialize an input set. Yes, you can consider a "string" to be a serialization but it is (i) hacky and (ii) doesn't allow you to easily set other args and kwargs unless you again hack your own serialization.

Since the YAML spec doesn't give an input set, the user has to select one when loading the YAML. However, certain YAML specs should really be restricted to working with only a single input set (e.g., HSE band structures use special procedures intended for the scf nature of all HSE calcs). Other YAML specs (e.g. normal bandstructure) could be run with a variety of input sets (PBE, PBE sol, some vdW functional, etc...) and it's nice to retain the flexibility that's currently there.

shyuep commented 6 years ago

What's the problem with:

class MasterFirework:

    def __init__(self, structure, functional="PBE", input_set_kwargs=None):
        if functional == "HSE":
            self.input_set = MPHSESet(structure, **input_set_kwargs)
        elif functional == "PBEsol":
            ...

That would serialize properly in YAML, retain any argument you wish to pass to any input set. I certainly think it is preferable to:

class MasterFirework:

    def __init__(self, structure, input_set=MPRelaxSet):

even if we can somehow manage to serialize this properly like the old input sets. This mode essentially allows anyone to pass any input set to the Firework, even when the input set makes zero sense.

computron commented 6 years ago

Is MasterFirework intended to be a base class that other FWs (e.g. OptimizeFW) inherit from, or is it supposed to be a template for what the parameters of FWs like OptimizeFW should look like?

Some of the criteria for a good solution:

should be easy for the user to change functionals (check with above)
should also be easy to switch between "high-throughput" and "strict" optimization parameters - e.g., a high-throughput cutoff for EDIFF vs a "strict" cutoff for EDIFF. I imagined that the inputset might control this (e.g., MPRelaxSet vs StrictRelaxSet) but I'm not sure if that's the strategy you want to pursue. Ideally, we don't manually need to set input_set_kwargs but this is an option if really needed.
many FWs want to support a ton of functionals - LDA, GGA, PBEsol, SCAN, HSE, one of the million vdW functionals, and new functionals that might come out. For example, both OptimizeFW and StaticFW can essentially be run with any functional, and many workflows like bulk modulus can also be run with essentially any functional. It should be possible to support a lot of possible functional choices without repeating a lot of code like "if functional1 --> dothis, ...., if functional345 ---> dothis", especially having all those if-else statements are repeated for many different FWs.

I agree that currently, it's possible for someone to put in an NEBInputSet into an OptimizeFW, which is silly.

There are also weird hidden dependencies that I didn't realize. For example, StaticFW calls MPRelaxSet.from_previous_run() in order to update the relaxation parameters for static runs. For the most part, this from_previous_run() does a good job of not interfering with whatever parameters you had already set in the INCAR (e.g,. if you had started out HSE, you stay HSE). But the POTCARs get overridden to the MPRelaxSet POTCAR even if you were using something else. This kind of dependency is difficult even for me to debug when things go wrong.

computron commented 6 years ago

above it should read MPStaticSet.from_prev_calc not MPRelaxSet.from_previous_run() - but it's the same concept. MPStaticSet is subclassed from MPRelaxSet and ends up using the POTCAR from MPRelaxSet, even if you had the POTCARs set to something else.

bocklund commented 6 years ago

On one hand it's silly that we can pass NEBInputSet to an OptimizeFW, but on the other hand I've also found it useful to be able to override the input sets with some that our group agrees on internally so that we are consistent with each other. Maybe there's some amount of sanity checks that we could do like checking that NSW > 0 in OptimizeFW, etc. But I would err on the side of being more flexible personally.

The same goes for the MasterFirework (if it is indeed a base class) in that users could not override the behavior. Then users would have to propagate some keyword arguments through multiple layers (is this argument for a Firetask or the MasterFirework or the base Firework?), monkey patch the MasterFirework class or just override anything that inherits.

Can I ask who atomate supposed to be for?

Is it more for

developing the one true method that the community (at least the community of atomate users) agrees on?
a framework that users are expected to customize, build on top of, and (possibly) contribute back?
a really good piece of software tailored to the people who are using it? (no offense intended)

If this is too off topic, we can discuss it in another issue.

shyuep commented 6 years ago

@computron I am only using "MasterFireWork" as a dummy name, merely to illustrate what is being done.

More generally (to address some of @bocklund points), the point of having an atomate is to standardize. For functional changes, most changes can be handled via user_incar_settings. E.g., to change from PBE to PBEsol is just a a GGA = PS tag. HSE is slightly more involved, but still, it is completely doable via user_incar_settings. There is already plenty of flexibility built into the input sets of this kind of thing, and it is not difficult to support these easily in the relevant FireWorks. I am waiting for someone to give me a use case that cannot be handled with one or two conditional statements plus user_incar_settings (or a few other parameters). Over-engineering for flexibility is detrimental to usability.

I generally err on the side of least flexibility, and gradually open up as needed by the community. Design for the 80% and maybe have some flexibility for the remaining 19%. We can safely let the final 1% of ultra power users handle themselves, e.g., those who want to use an atomate workflow with their own customized version of VASP with a new functional that takes into a new input file in a different format from INCAR, KPOINTS.

shyuep commented 6 years ago

Also, checking for input is unpythonic. If you do something like if isinstance(myobject, MPRelaxSet), that is much worse than the if-else statement I wrote. Checking for NSW, etc. does not solve the problem. There are just too many things to check. You will end up with crazy if statements all over the place.

I would argue that the essence of software design should be such that the more likely a use case, the more streamlined and brainless the use should be; the more unlikely a use case (e.g., supplying NEBsets to an other optimize workflow), the more difficult (or impossible) it should be to do it.

bocklund commented 6 years ago

Those things are all fair. We are in agreement about the 80/19/1, however in my (relatively limited) experience, scientists want to do things their own way or there will always be someone who steps up with a special case.

Maybe I'm in the 19% (or even the smaller slice), but we use customized input sets everywhere. if isinstance(myobject, MPRelaxSet) is surely a pain, but

class MasterFirework:

    def __init__(self, structure, functional="PBE", input_set_kwargs=None):
        if functional == "HSE":
            self.input_set = MPHSESet(structure, **input_set_kwargs)
        elif functional == "PBEsol":
            ...

means I need to subclass that Firework and make my own versions of all workflow functions that use it unless I am ok with accepting MPHSESet.

Another example is that you cannot chain arbitrary VASP calculation Fireworks together, if I want to do an ISIF 2 -> ISIF 4 -> Static calculation.

See also: https://github.com/hackingmaterials/atomate/issues/218. A force convergence check is reasonable in some contexts, but not others. The issue stems from not being able to propagate parameters down through to Firetasks, which is too much flexibility, but illustrates a point.

It's possible that I don't have perspective or I'm not in the right slice of users, but I have already bumped up against some (minor) flexibility things. My honest question is that if most properties that atomate provides workflows for are a part of the Materials Project, why not just submit your structure(s) to the MP? To me, the answer is that users want to customize beyond it.

Maybe my concrete suggestion for this issue is to organize atomate such that there is a more defined line between the "atomate: the collection of community vetted recipes" and "atomate: the framework/library of Fireworks and Firetasks". The recipes of workflows should be sufficient for the 80% who just want to use PBEsol instead of PBE and might mistake passing an NEB set to an OptimizeFW. The Fireworks and Firetasks can be flexible enough for the 95% to piece together their own workflows.

shyuep commented 6 years ago

@bocklund I think it is easier if you provide a specific use case that is not covered by current solutions. I just looked at atomate code. Most of the core FireWorks do indeed have user_incar_settings as a parameter. What that tells me is that if you really want to do ISIF = 2 -> ISIF = 4 -> Static, all that needs to be done is something like:

fireworks:
- fw: atomate.vasp.fireworks.core.OptimizeFW
  params:
    user_incar_settings:
      ISIF: 2
- fw: atomate.vasp.fireworks.core.OptimizeFW
  params:
    user_incar_settings:
      ISIF: 4
    parents: 0
- fw: atomate.vasp.fireworks.core.StaticFW
  params:
    parents: 1

Nowadays, I seldom run workflows myself. So tell me if this doesn't work. The whole point of the YAML spec is such that you can arbitrarily chain workflows together. Note that OptimizeFW right now has some weird override_default_vasp_settings parameter, which is what I htink user_incar_settings should be. Most of the other FireWorks do have user_incar_settings. It is a minor matter to correct this for OptimizeFW so that the above yaml spec works.

shyuep commented 6 years ago

I should add that the ISIF=2 -> ISIF=4 -> Static is even possible just from Custodian's simpler yaml spec, much less atomate's much more powerful yaml spec.

bocklund commented 6 years ago

It actually doesn't work because OptimizeFW doesn't know to copy over the CONTCAR from the previous calculation if it has parents or calc_locs specified.

You might suggest to get the same behavior by overriding the input set for the StaticFW or by using user_incar_settings and using those. However current way StaticFW works is also kind of wonky because if there are parents with prev_calc_dir or calc_locs specified, it will actually throw away the input set you pass to StaticFW and force you to use MPStatic.from_prev_calc set with all those parameters applied for you and you get some control by passing vasp_input_set_params to the StaticFW.

Custodian is fine in this example, but if you want any control flow or the ability to handle ISIF 4 failing to write some intermediate result to a database without redoing ISIF 2 with Fireworks, then you're out of luck.

I know this seems like a contrived example, but it's a real thing that I implemented and worked around.

shyuep commented 6 years ago

I would suggest: i. Fix OptimizeFW to allow specification of prev_calc. ii. Do not allow vasp_input_set in StaticFW. iii. Add user_incar_settings for both classes. In fact, the argument signature for all core Fireworks should follow some standardized form, rather than having vasp_input_set_params for one class and user_incar_settings for another. Certainly we can use vasp_input_set_params if that is preferred. This would be more flexible in the sense that you have access to any argument in the input set, but it is more cumbersome when all you want is user_incar_settings={ISIF: 2}

What I mean is that these contrivances are mainly a matter of inefficiencies in design, not because that allowing people to throw in an arbitrary input set is the way to go. Right now, there is no guarantee that anyone passing in a vasp_input_set is not passing something with NSW>1 for the StaticFW, which would make that FW nonsensical.

computron commented 6 years ago

Ok glad to see this thread is alive!

@bocklund can you submit a separate issue about chaining of FWs? I think we all want to the FWs to be chain-able (i.e., have the inputs of one FW be consistent with the output of another). But, since we can identify this as a clear and separate issue, we can create one for it.

I want to clarify the purpose of atomate. Yes, we want to standardize. But we also want to make going off the beaten path simple and intuitive - and to make this a natural and expected thing for people to do. The design of atomate would be simple if it was just about calculating pre-defined recipes.

For example, here's the type of thing that I would expect atomate to do well:

Someone computes a GGA band structure without even thinking about it
Next, we realize that vdW corrections might be important. It should be easy to run the same workflow but with a vdW functional - ideally without knowing too much about exactly which INCAR parameters are associated with each vdW functional and just by listing your vdW functional as a String. Meaning, I want to use a vdw="optB88" type of syntax like MPRelaxSet has in pymatgen, not a user_incar_settings={"LUSE_VDW": True, "AGGAC": 0.0, "GGA": "BO", "PARAM1": 0.18333333, "PARAM2": 0.22} syntax. Obviously one needs to draw a line as to at what point things should be easy like a string versus custom; to me, this vdW example still falls under a 'don't make me think' operation.
Similarly, if we want to get an HSE approximation to the gap, it should be clear (i) what the different approximations are (e.g., gap correction only vs full HSE) and (ii) exactly how to run it without needing to pore over details of the code or again needing to use user_incar_settings. Again, I consider this under a 'don't make me think or know all the VASP parameters' example.
Where things become very custom IMO is if someone wants to mess with the HSE mixing parameter for HF. At that point, I think it's reasonable for someone to have to learn the VASP tag for it and override the user_incar_settings themselves.

Maybe what we need is to ditch the input sets like @shyuep says and just have atomate make a collection of preset incar dict overrides that represent switching functional or doing common "don't make me think" operations. These dict overrides maybe will read pymatgen data (e.g., so we don't recode all the various vdW INCAR parameters) but don't depend on the user creating an input set that represents what they want. Since people will also want to override kpoints and potcars we should probably prefer vasp_input_set_params instead of user_incar_settings.

Thus, given all this, the sketch of the code would be:

each FW follows a consistent signature (to the extent possible), is chainable (to the extent possible)
each FW hard-codes the input set that is used for that operation. If the FW wants to, it can have some parameter like "functional" that @shyuep proposes to switch between multiple input sets. Up for discussion still is whether such a parameter could also allow a custom input set rather than just toggle between a few presets. However, there is something to be said for having just a single correct way to do things rather than the option to either (i) always use the default vaspinputset and override to customize vs (ii) override the vaspinput set, plus possible customizations on top of that. Will need to think and discuss this.
if you want to do something other than the hard-coded input set, feel free to override with vasp_input_set_params.
Atomate does something to make exploring a little off the beaten path (e.g., switching to a vdW functional) less painless rather than having to always explicitly set VASP INCAR parameters. e.g., collections of common vasp_input_set_params for doing things like stricter convergence or switching the functional that aren't as painful as manually setting each VASP tag individually.

computron commented 6 years ago

Btw I am not so concerned about protecting people from doing the wrong thing. In any design, it's possible for someone to create nonsensical inputs.

I'm more concerned with making it clear on how to achieve certain things. e.g., I know how to run a bandstructure with atomate and now I want to switch to a vdW functional, or I want to switch out a POTCAR throughout the workflow - what is the best way to do this?

hackingmaterials / atomate

rethink some atomate organization #230