Duplicate defaults in atomate

shyamd commented 5 years ago

Currently atomate has a lot of default values for parameters in the input sets that are already defined in pymatgen. It would be good to remove those from atomate so that default for any calculation type fully defined in pymatgen reside there. This makes understanding where settings are coming from a bit more transparent.

computron commented 5 years ago

This by design - putting the settings in atomate leaves any changes to the default settings in atomate's control, not pymatgen's control.

e.g., if a user updates pymatgen (e.g., to get some new structure analysis feature, but which comes bundled with a change in pymatgen default settings) but leaves their atomate version alone, should their workflows change? It seems to me a user should need to explicitly update atomate in order to see new default settings.

shyamd commented 5 years ago

I think that sounds reasonable if we kept up in updating atomate. The issue is that we don't update the workflows that often since a lot of them are written by postdocs who come in for a short period of time, write a workflow, and then leave. We don't want to end up with a situation where theres tons of stale technical debt simply because defaults in atomate lag behind pymatgen which a lot more people use even for running VASP.

mkhorton commented 5 years ago

Seconding that this is causing confusion for new atomate users. Ideally there should only be one source of truth. It also seems like an easy way for a bug to creep in if the pymatgen input sets are updated to fix an issue but atomate then overrides a particular setting.

For example, it's not clear to me why atomate has changed the default reciprocal density for MPNonSCFSet in NonSCFFW from 100 to 1000, if the value in pymatgen is too low, why not change it there?

computron commented 5 years ago

If you guys are able to:

update pymatgen with all atomate defaults (and have it merged in officially) OR
update / retain pymatgen with better defaults than what's in atomate today (with solid backing that it's better)

then I'm happy to move forward with putting everything in pymatgen.

Note that our DOS is already pretty bad as per many discussions. I'm more concerned about atomate still being too low than I am about pymatgen being closer to the a good value.

shyuep commented 5 years ago

I would say that where atomate overlaps pymatgen, e.g., specifying parameters for MP-style calculations, these should be in pymatgen and atomate should avoid duplication. For other cases, there is no absolute truth, since it all depends what user requirements. I am fine if atomate has a different workflow called CapAmsRecipeForUltraAccurateDOS that is very different from MP defaults. But users who just want to perform a single calculation for comparison with MP should be able to do so without having to resort to atomate workflows.

bocklund commented 5 years ago

I agree with @shyuep. Further, I think that atomate calculations should be entirely driven by PMG input sets (or any object that implements the API). To me, it seems rare that a user would actually want to pass different reciprocal densities to something like WriteVaspNSCFFromPrev and be in a position to change that in a reasonable way. Instead, they probably want to set one and forget it (maybe it's the same as the MP set default, maybe not). Any customization outside of a suite of PMG input sets (such as the MP suite) should be achieved by subclassing PMG input sets and telling the Firetasks/Fireworks to use those.

We have been using completely different input sets more tuned for metals and we basically had to rewrite several Firetasks, e.g. WriteVaspStaticFromPrev that assume MPStaticSet and propagate those changes up the FT/FW/WF structure.

shyamd commented 5 years ago

@bocklund I see your point about being able to use input sets and having your own custom set. Basically if you want to use a non-MP input set it shouldn't be that hard to change it. That being said, i don't agree that people should have to make a new input set to make a change. The input sets should determine the defaults and the user should be able to then make other changes on top of that.

bocklund commented 5 years ago

@shyuep Yes, it should also be easy to start from the MPRelaxSet, make a minor modification and give that to atomate without creating an entirely new class. In my experience, it's not easy to make modifications to the PMG input sets without creating a new class, but that's probably more a PMG issue rather than atomate.

mkhorton commented 5 years ago

As an aside, the current way custom VASP input sets are handled seems dangerous also, e.g. if you have a FW(struct, custom_vasp_input_set=None) and then you supply your custom input set, it'll override whatever struct you're also supplying to that FW without raising any error if the structures don't match? On the input sets themselves, I also realized recently there's no way to delete an INCAR argument, e.g. if you want to remove the default ENCUT, without creating a new input set (indeed, there is an input set that has explicitly added this option as kwarg as a workaround). We definitely need to improve the system for modifying the input sets that make up an atomate workflow, without using hacks.

Broadly speaking, I agree with all above + of course we are (collectively) MP, so it doesn't make sense that we've ended up in this situation where vanilla MPSet calculations have different defaults between atomate and pymatgen. I'm reminded of Conway's Law here :)

computron commented 5 years ago

Ok I think this is a good discussion but I am not sure we are going to solve this problem over this thread, especially given that there are multiple concerns in the air now.

How about I suggest the following:

everyone at LBNL interested in this problem meets in person for a 2-3 hour meeting, with the goal of coming up with an action plan of all changes needed to the various codes. This should be a fairly detailed set of proposals that may affect both pymatgen and atomate. The written plan should include some sketch of who will do what, in addition to what will be done.
We send out the written plan and hold a videoconference call (e.g., 30 mins) with ALL parties (e.g., Brandon + Shyue) to discuss.
If all are in agreement by the end of the call, we make specific plans for implementation (e.g., maybe carving out a couple of specific days to do it like a focused Codebusters).
Otherwise, back to the drawing board...

shyuep commented 5 years ago

I will just have one final comment - I am confused what @bocklund means by saying that it is not easy to make a change to PMG input sets. The user_incar_settings, user_kpoints_settings, etc. all exist for that purpose.

@mkhorton The reason why it is dangerous is because the input set is supplied as an object, not a class/type or interpretable string. That is why the input set overrides structure in the atomate workflows. As for deleting parameters, we can easily add a feature that None => unsetting a parameter. E.g., a user_incar_setting = {"ENCUT": None} would unset that particular setting for an input set. As far as I know, this would be a minor change in the DictSet and should affect no other feature since None is a special keyword that is not used in any of the settings.

I will of course not be able to attend the meeting. Deputy Dictator @mkhorton can speak on pymatgen's behalf.

computron commented 5 years ago

I think we can continue the discussion here until the meeting actually happens

I like the None suggestion
If I remember correctly, one of the issues in "let's do everything with the MP input sets" is that many parameters (usually the structure) is not known in advance at the time of workflow creation. e.g., let's say the user has a workflow that is optimization -> static, and they want to update the parameters of the static workflow. If they want to throw in a custom input set for the "static" step, they need to throw in a dummy structure, e.g., the initial structure (??), since the actual structure is not known at the time of workflow creation before the optimization happens. This creates a few problems:

It is not super clear to the user what dummy structure to throw in - although the initial structure for the workflow might be a reasonable guess
It's not clear that throwing in a dummy structure will result in the correct behavior without a detailed code review. For example, let's say I want to simply use MPStaticSet but with some custom user_incar_parameters for the second step of my workflow. So, when creating the workflow, I throw in a custom MPStaticSet with my own user_incar_parameters and a dummy initial structure (since a structure is required). But, let's say the MPStaticSet figured out a k-point density in the init() based on the volume of the structure put in. Then, the k-point density (or perhaps other parameters) is being set at the time MPStaticSet is constructed based on the dummy structure being put in. If I throw in a dummy structure in the beginning but the structure changes as a result of the optimization, it's not clear that my custom MPStaticSet will reflect the current structure - unless I fully reinitialize the custom MPStaticSet with the new structure and know that user_incar_parameters (and perhaps 5 other parameters of the input set) were also overriden. It's not super clear how to do this.

@shyamd @mkhorton what is your solution?

bocklund commented 5 years ago

I have addressed some of these challenges in dfttk where we have rewritten parts of atomate that don't play well with custom input sets, such as the wrong structure problem for instances of InputSets that @mkhorton pointed out (though that "solution" is more of a hack).

IMO, a core part of this issue comes from how the WriteVasp* tasks are implemented and I think changing those addresses most of it and the rest is just propagating the changes up the chain. If we're considering rewriting those, I'd also ask that https://github.com/hackingmaterials/atomate/issues/238 be considered if everyone at the in person meeting decides it's in scope for fixing this issue.

I'm happy to participate on any calls and offer my perspective. In my ideal world, this can reduce the amount of reimplemented code in dfttk, but I understand the difference in priorities.

@shyuep TLDR: user_XYZ_settings works, but it's a pain if you want to make the same modifications every time.

You can imagine that ultimately a user will develop custom settings are not one-offs that they want to apply to every run of atomate (e.g. I always want ENCUT to be 400 eV). Then you either have to apply user_XYZ_settings all over the place or automate it in some way. Automating it, you end up either

writing a wrapper function to apply the settings of interest to the input set. If different input sets have different custom options, you end up having a 1:1 map of wrapper function to PMG input set. At that point, you might as well do option 2.
subclass DictSet. IIRC you can't actually subclass MPRelax set (I think because of the YAML loading?) so you have to make a whole new DictSet to make even a simple modification.

computron commented 5 years ago

To give a real example of how this could be dangerous - the init() of the DictSet might sort and reduce the structure. If the structure is updated from the time an input set is created, there is no quick and easy way to make sure these sortings / reductions are properly updated without re-initializing the input set again. I can't just take the input set as given by the user and use it

shyamd commented 5 years ago

So it sounds like some of the issues may be when the InputSet is being constructed. Currently, it happens twice in a lot of Fireworks, once on the construction of the Firework, and once on running the Firetask. Deciding when that should happen should make this a bit clearer. This could also depend on whats given resulting in a different firetask.

shyuep commented 5 years ago

@bocklund This is a fundamental philosophical difference. The raison d'etre for having an input set is to standardize. By definition, that means modifications should be rare and exceptional. So I don't see a problem with a user writing a simple:

def get_custom_input_set(structure, *args, **kwargs):
    return MPRelaxSet(structure, , *args, user_incar_settings={"ENCUT": 400}, **kwargs)

To claim this is as complex as subclassing DictSet is inaccurate.

As for sorting/reducing structures, the input set is by definition a manipulation of the structure to generate a set of inputs. Most of the time, the structure processing is done in accordance to some predefined rules, e.g., conventional/primitive standardization. By default, the only thing that DictSet does is to sort the structure, which is what you'd want in most cases to avoid having atoms of different species spread all around the input, with POSCAR species like Fe P O Fe O.

shyuep commented 5 years ago

The None to unset incar setting has been implemented. https://github.com/materialsproject/pymatgen/commit/111f44ffc450e2db8c30a923dfb664c309c76194

bocklund commented 5 years ago

I have continued to give some thought on what it means (to me) to standardize. Keep in mind I am coming at this from outside the core MP group -- I have finite computer time, different projects with different accuracy needs, and I need to automate no more than a few hundred to a few thousand calculations for each. I see atomate as a way to communicate with the community and develop consensus on what the standard is for a type of calculation (e.g. a phonon calculation) in a code-agnostic fashion. To this extent, we should focus on implementing methods and let users determine the accuracy.

We can develop a guiding principle for calculation Fireworks that can be used to check a PR or proposed change against this principle. I propose the following: "Calculation Fireworks should only enforce minimally required calculation settings for the calculation type".

In more words: Each Firework has some core task where a calculation setting is critical to that task. For example, a static calculation should always have NSW=0. We should provide defaults for settings that control calculation accuracy, but we should never forcibly override them.

What are some concrete examples?

A static calculation can forcibly set NSW=0 and ISMEAR=-5
A relaxation should not override a user-provided ISMEAR or EDIFFG, but can provide a reasonable default of ISMEAR=0 and EDIFFG=-0.05.
NSCF calculations can force ICHARG=11, but should not override user-supplied kpoints (#278).

I think we can also use this as a guiding principle for designing a calculation superclass, where one of the main common features of the superclass is to provide an inheritable set of arguments to control calculation accuracy.

shyamd commented 5 years ago

@bocklund Isn't that what the InputSets are for? If we get the defaults out of Atomate, then as long you define an inputSet you get your settings propagated through.

bocklund commented 5 years ago

Right now there's not a 1:1 map between input sets and calculations (see DFPTFW, workflows that use get_wf_deformations). To me, a 1:1 map would be appropriate and fall under the design principle, but some of the stuff is hidden through custom modifications in atomate of MPStatic set and so on.

shyuep commented 5 years ago

@bocklund I would modify a few things with regards to principles:

All atomate Fireworks have a 1:1 correspondence to a PMG input set, unless it is for a type of run that is not of interest to the general public. This means that Atomate should refrain from internal modifications to PMG parameters, unless it is explicit. E.g., a OptimizeFirework should not modify PMG parameters, but a OptimizeVeryAccurateForcesFirework can.
There should be no forcible anything. User wishes are paramount. The PMG Input Sets set reasonable defaults, but if a user wishes to use the MPStaticSet and set NSW to be non-zero, that is up to him. Maybe he has a good reason to. Certainly, we can implement warnings but the user is the ultimate arbiter.
User overrides should propogate to Atomate Fireworks. E.g., things like user_incar_settings, user_kpoints_settings, user_potcar_settings should be supported and passed through to the relevant PMG input set.

shyuep commented 5 years ago

@bocklund I have a quick question. It is clear that ISMEAR = -5 is good for accurate energies in metals (no relaxation). What about forces (no relaxation)? Is ISMEAR=1 still preferred or is ISMEAR=-5 fine?

bocklund commented 5 years ago

@shyuep In my experience, ISMEAR=-5 is not good enough for accurate forces.

IMO, if you don't know beforehand whether you have a metal or non-metal, you can't do one calculation for both accurate energy and accurate forces. You need to pick one or the other:

I need total energy - I use ISMEAR=-5 in all cases (given enough kpoints)
I need forces - I use ISMEAR=0 if I have no other information.

If I know a priori whether I have a metal or a not, I can specialize the forces to either use ISMEAR=-5 or ISMEAR=1 for non-metals/metals, respectively.

shyuep commented 5 years ago

@bocklund What about non-metals that are metals in DFT? i.e., should have small gap but ends up having zero gap in DFT.

bocklund commented 5 years ago

If DFT predicts occupied states at the Fermi energy, then the settings should reflect the behavior of that calculated electronic structure, even if we know that it's not metallic in reality

shyuep commented 5 years ago

@bocklund Ok, thanks. I implemented a warning for obvious situations. https://github.com/materialsproject/pymatgen/commit/8c882110b66649dec029f674feb93a332b0baa86

hackingmaterials / atomate

Duplicate defaults in atomate #268