Improve symmetry reductions, transformations, and convergence

jacksund / simmate

The Simulated Materials Ecosystem (Simmate) is a toolbox and framework for computational materials research.

https://simmate.org

BSD 3-Clause "New" or "Revised" License

29 stars 9 forks source link

Improve symmetry reductions, transformations, and convergence #235

Open scott-materials opened 2 years ago

scott-materials commented 2 years ago

Describe the desired feature

I'd like the best structures in an evolutionary search to be symmetrized. For example, if a search produces a cell with 119.8 degree angles and virtually identical lattice vectors, we know it should be symmetrized and relaxed to see if the new cell is better. Presumably, a generic symmetry operation with spglib and a 0.1 Angstrom (?) displacement would do the job. Because we generally assume the symmetric structure is the ground state, this additional step might better zero in on the ground state. I would imagine that the relaxation and static calculation would be performed at a slightly higher quality than the highest quality relaxation (in terms of EDIFF or EDIFFG) in an effort to get a cell with the lowest energy possible.

Although this adds 'unnecessary' calculations, the effect would be to shorten the search by quickly transforming a good structure into the best structure. After all, the calculation is prolonged each time that a minimally improved structure appears (e.g., <1 meV better) even if the structure is essentially identical. My thought is that this calculation should be run on the 2? best structures in the search after 15 structures have been calculated.

An additional benefit -- the final output structures would be fully symmetrized, which would be nice for subsequent usage.

Current behavior and/or alternatives

I can symmetrize manually and do a subsequent geometry optimization, etc., and manually compare this structure to the best structure of the search. This should be automated.

Example Use

No response

Additional context

No response

jacksund commented 2 years ago

Thanks! I agree - it'd be nice to have more high-symmetry structures in the results. Right now, all transformations look to reduce the symmetry, and I've been thinking about a SymmetryReduction transformation. It would take structures and run them through spglib at increasing cutoff values until we get a new high-sym structure(s). This would be really easy to add.

Because we generally assume the symmetric structure is the ground state, this additional step might better zero in on the ground state.

I might misunderstand what you're saying here, but I'm not sure this is true. I think lower symmetry structures will always be better (in terms of energy and search progress) -- they're just not as pretty of an output. They'll have equal or lower energy compared to high-symmetry equivalents because there are more degrees of freedom for the atoms/forces/lattice to relax. The extra degrees of freedom will also better enable the structure to 'jump' to another local minima that is better, which will help the search. We'd have to do some advanced benchmarking to check this though.

I would imagine that the relaxation and static calculation would be performed at a slightly higher quality than the highest quality relaxation (in terms of EDIFF or EDIFFG) in an effort to get a cell with the lowest energy possible.

Allowing separate workflows for each creator/transformer would be difficult to implement. However, we could instead run a unique workflow after the search completes -- i.e. once the search ends, take the top N structures and submit them to a "clean-up" workflow that enforces high symmetry and high quality relaxation/energy calcs. This sounds in-line with your "additional benefit" comment.

the calculation is prolonged each time that a minimally improved structure appears

I think an improved StopCondition subclass is the solution to this. One that inspects the history of best structures and checks for "how much" structures are improving. We could have advanced logic for stopping the search through this. For example, if the structure fingerprints are within X distance and energies are within Y eV, then treat these as the same structure when determining the stop condition.

jacksund commented 2 years ago

To-do items for this issue:

[x] add pre_symmetrize_structure kwarg to workflows
[x] add SymmetryReduction transformation (and make steadystate_source)
[x] add new StopCondition that accounts for insignificant/minor improvements in best_structure

scott-materials commented 2 years ago

I might misunderstand what you're saying here, but I'm not sure this is true. I think lower symmetry structures will always be better (in terms of energy and search progress) -- they're just not as pretty of an output. They'll have equal or lower energy compared to high-symmetry equivalents because there are more degrees of freedom for the atoms/forces/lattice to relax. The extra degrees of freedom will also better enable the structure to 'jump' to another local minima that is better, which will help the search. We'd have to do some advanced benchmarking to check this though.

I think we're talking about different situations. You're correct that allowing relaxation without symmetry constraints will always give the correct local minimum. But I'm talking about those times when the structure is relaxing towards a high symmetry structure from a low symmetry starting point. This happens quite often. For example, in my most recent test, I ended up with these best structures:

| | id | energy_per_atom | updated_at | | 0 | 2262 | -7.28851 | 2022-08-19 03:43:32 | | 1 | 1379 | -7.28785 | 2022-08-18 22:37:56 | | 2 | 1361 | -7.28777 | 2022-08-18 22:35:10 | | 3 | 1219 | -7.28773 | 2022-08-18 22:05:11 |

Structures 0-3 and basically identical, +/- a tenth of Angstrom or +/- a few tenths of a degree. If run id had been symmetrized and relaxed with higher convergence settings, it would have immediately given the correct ground state. Instead, it stopped at 'almost' the ground state, which then let three more randomly initiated structures to eventually beat 1219. Because run id 2262 beat 1219, we had to churn through an additional 1000 calculations before hitting the stop condition.

In my experience, this is a pretty common experience. A typical run ends up with ~5 identical best structures. Obviously a waste of calculation effort.

I would imagine that the relaxation and static calculation would be performed at a slightly higher quality than the highest quality relaxation (in terms of EDIFF or EDIFFG) in an effort to get a cell with the lowest energy possible.

Allowing separate workflows for each creator/transformer would be difficult to implement. However, we could instead run a unique workflow after the search completes -- i.e. once the search ends, take the top N structures and submit them to a "clean-up" workflow that enforces high symmetry and high quality relaxation/energy calcs. This sounds in-line with your "additional benefit" comment.

Running after would give us nice unit cells but wouldn't lead to a speed up in the calculation. Maybe this could be implemented as part of the default workflow: relax_0, relax_1, relax_2, relax_3, static_1, E=lowest?, symmetrize, relax_4, static_2, save the lowest of static_1 or static_2.

Or it could be implemented as a separate worker task. E.g.:

workflow_name: structure-prediction.python.symmetrize

This would just be launched independently / simultaneously and monitor the database for the best structures of the same composition that haven't gone through the additional symmetry/relax/static/check process.

the calculation is prolonged each time that a minimally improved structure appears

I think an improved StopCondition subclass is the solution to this. One that inspects the history of best structures and checks for "how much" structures are improving. We could have advanced logic for stopping the search through this. For example, if the structure fingerprints are within X distance and energies are within Y eV, then treat these as the same structure when determining the stop condition.

Yes, or this. The combination of energy change + fingerprint would do it.

scott-materials commented 2 years ago

I do like the idea of doing a relaxation and static after the symmetry transformation. It is possible that the symmetry operation makes it worse, in which case the transformation should be rejected and possibly rerun with stricter distance tolerances.

jacksund commented 2 years ago

relax_0, relax_1, relax_2, relax_3, static_1, E=lowest?, symmetrize, relax_4, static_2, save the lowest of static_1 or static_2

I can do this. This would just move the idea of a subworkflow_cleanup to into the NewIndividual workflow. I'm guessing the idea of a workflow series (i.e. several workflows with validation between each) will become increasingly important as we optimize the search. On top of the "E=lowest?, symmetrize", I think the most useful step would be to check the validator again. By the time we've completed relax3, the structure will closely resemble the final structure and our fingerprint/structure matcher will be much better at catching duplicates. In theory it'd be a simple loop like this:

subflows = [
    "relaxation.vasp.staged",
    "relaxation.vasp.mit",
    "static-energy.vasp.mit",
]

for subflow in subflows:
    result = subflow.run(...)
    check = validator.check(result)  # gives True if it passes the check
    if not check:
        break   # exit the for-loop and skip the remaining flows

In practice, this would complicate how we read individuals from the database. I won't know until I try implementing it.

jacksund commented 2 years ago

structure-prediction.python.symmetrize

A separate symmetrize workflow might not be necessary. All relaxations already support pre_sanitize_structure and pre_standardize_structure parameters -- and the later symmetrizes the structure using a cutoff matching the SYMPREC setting. It's part of the base VaspWorkflow class that all inherit from:

https://github.com/jacksund/simmate/blob/0631495601482981e5c8d8e17588b9cfe8a41fe4/src/simmate/calculators/vasp/workflows/base.py#L167-L184

I use this parameter a bunch, but wasn't sure if anyone in our lab was aware of it.

I can always add a pre_symmetrize_structure parameter too

jacksund commented 2 years ago

There's also the chance we're over complicating this. By default, VASP uses a SYMPREC of 1e-5 and I keep this default throughout the relaxation.vasp.staged. Perhaps increasing this value to something like 1e-2 for final stages will help. Other evolutionary searches have gone down this rabbit hole too, so we need to be careful. Straight from the USPEX manual they say the following:

To get full space group, either increase symmetry tolerances (but this can be dangerous), or re-relax your structure with increased precision.

EDIT: here's the full quote

Caution: When looking at space groups in the file Individuals, keep in mind that USPEX often underdetermines space group symmetries, because of finite precision of structure relaxation and relatively tight space group determination tolerances. You should visualize the predicted structures. To get full space group, either increase symmetry tolerances (but this can be dangerous), or re-relax your structure with increased precision.

scott-materials commented 2 years ago

Interesting suggestions. Some thoughts:

(1) SYMPREC won't shift the unit cell origin (or unit cell shape), whereas spglib would. So the spglib transformation would still be necessary.

(2) I think the USPEX quote is only partially correct. I agree that you can (1) increase symmetry tolerances, or (2) re-relax with increased precision. But there's a third, better option. Symmetrize, relax, static, and compare energies. This removes the danger of (1) while avoiding the full computational cost of (2) since the symmetrized calculation is clearly much faster, even if performed at higher tolerances. I think it's this third option that I'm after.

(3) I like the idea that most structures that are registered in our database are properly symmetrized. If these are symmetrized, it allows far more efficient searching of the database... e.g., for extracting periodic trends. But no one should really care about those structures very far from the Hull. So perhaps instead of E=lowest? this becomes E_above_hull < 0.25 eV/atom to determine whether the later stages are engaged.

scott-materials commented 2 years ago

or rather E_above_current_best < 0.25 eV/atom

jacksund commented 2 years ago

Awesome, these are some good ideas to get me started. I imagine it will take a full day to organize our thoughts and test out different suggestions. I'll work on this tomorrow and post an update.

jacksund commented 2 years ago

I added a convergence_limit parameter for the stop conditions. It's used alongside the limit_best_survival parameter. It will absorb minor changes in energy with equivalent structures -- and prevent prolonging a search. I set the default value to 1meV/atom, which fixes all the cases I've tested so far.

As for the symmetry, I merged pre_standardize_structure and pre_sanitize_structure functionality into to a single standardize_structure parameter that accepts different modes. Then I added symmetry_tolerance and angle_tolerance parameters to let us tune how the standardization is done at each relaxation stage. So for relaxation.vasp.staged workflow, we could benchmark what the best values are. Currently, I'm using a 0.1A + 5deg tolerances past relaxation 3.

For "extreme" symmetry reduction, I'm leaving it to a new transformation that uses large tolerances -- as much as 1.5A + 25deg. I think the more atoms in a composition, the more important this transformation will be. In my quick testing, anything >0.1A + >5deg is prone to significantly changing the structure -- and therefore I'd like to keep these as transformations.

I also increased the tolerances used when registering a structure to the database (from 0.01A to 0.1A). Even if there is slight disorder in the structure, the database column for spacegroup will be more lenient and help analysis.

But there's a third, better option. Symmetrize, relax, static, and compare energies. ...... But no one should really care about those structures very far from the Hull. So perhaps instead of E=lowest? this becomes E_above_hull < 0.25 eV/atom to determine whether the later stages are engaged.

This is still out of reach. I still plan to do with a list of subworkflows like I described before. That will take more time.

scott-materials commented 2 years ago

Very cool!

I've added some comments in the pull request. Recapping here: after relaxation, a tolerance of 5 degrees is quite large. From my data, it should be maximum 1 degree and probably better at a maximum 0.5 degrees.

In my testing, I used spglib directly, e.g.,

from spglib import get_spacegroup,standardize_cell standardize_cell(cell, to_primitive=True, no_idealize=False, symprec=2e-3, angle_tolerance=0.01)

I used as input Simmate output on a 15-atom cell. My very worst cell could still transform to the correct symmetry with the above settings. A lower symprec or angle_tolerance prevented the transformation. Although my findings are limited, I'd say that the symprec of 0.01 Angstroms and angle_tolerance of 0.5 degrees should be preferred.

jacksund commented 2 years ago

Huh... Your arrival at 0.01A is actually surprising for several reasons. Before I pull that thread, I do want to say that the default is in fact 0.01Ang: https://github.com/jacksund/simmate/blob/3b54ea9a4264a955edbfb66cd6e968bb0becdb26/src/simmate/calculators/vasp/workflows/base.py#L147

But for why 0.01A is surprising:

0.01Ang + 5.0deg is the default for v0.9.0. That means this issue was opened with the premise that these tolerances were too loose, but then your data is suggesting that even stricter tolerances are optimal. It therefore looks like you debunked the original issue. Just to show that this was indeed the default before:
- you can see the database uses default pymatgen tolerances here https://github.com/jacksund/simmate/blob/79b762f45372c7f5a303971c6121afedd31ddc7f/src/simmate/database/base_data_types/structure.py#L179
- in pymatgen docs, you see the defaults are 0.01Ang + 5.0deg (shown here)
- pymatgen is using spglib directly (shown here)
pymatgen actually recommends 0.1A for structures relaxed with DFT (shown here)

For structures with slight deviations from their proper atomic positions (e.g., structures relaxed with electronic structure codes), a looser tolerance of 0.1 (the value used in Materials Project) is often needed.

My results agree with yours -- but only if two conditions are met:
- we look at the final structure after all relaxations
- we only look at structures from RandomSymStructure (no transformations)

I'm looking at a slightly different case, which is probably why we are arriving at different conclusions

I'm looking at the input structure for relaxation 3 and beyond
I'm focusing on transformation structures

My results with 0.01A show that the spacegroup isn't being determined properly (spacegroup are all 1 or 2):	rank	id	energy_per_atom	updated_at	source	spacegroup
0	493	-4.82264	2022-08-21 20:12:31	from_ase.CoordinatePerturbation	1	334
1	193	-4.82252	2022-08-21 19:29:23	from_ase.MirrorMutation	1	85
2	205	-4.82244	2022-08-21 19:31:17	from_ase.LatticeStrain	2	85
3	432	-4.82242	2022-08-21 19:59:04	from_ase.MirrorMutation	1	205
4	484	-4.82238	2022-08-21 20:11:37	from_ase.CoordinatePerturbation	1	205
5	367	-4.82236	2022-08-21 19:49:58	from_ase.Heredity	1	[193, 205]
6	334	-4.82222	2022-08-21 19:45:57	from_ase.Heredity	1	[85, 217]
7	355	-4.82198	2022-08-21 19:48:46	from_ase.Heredity	1	[331, 196]

But 0.1A works much better with transformed structures (gives higher spacegroups):	rank	id	energy_per_atom	updated_at	source	spacegroup
0	207	-4.82265	2022-08-21 23:34:03	from_ase.LatticeStrain	12	171
1	127	-4.82185	2022-08-21 23:25:30	from_ase.Heredity	12	[37, 102]
2	171	-4.82045	2022-08-21 23:28:14	from_ase.SoftMutation	12	144
3	144	-4.82014	2022-08-21 23:26:44	from_ase.Heredity	164	[121, 102]
4	3	-4.82005	2022-08-21 23:13:56	RandomSymStructure	164
5	192	-4.81977	2022-08-21 23:31:32	from_ase.MirrorMutation	164	121
6	121	-4.81971	2022-08-21 23:24:26	from_ase.Heredity	164	[63, 3]
7	102	-4.8197	2022-08-21 23:20:54	from_ase.Heredity	164	[3, 54]
8	163	-4.81969	2022-08-21 23:27:55	from_ase.Heredity	12	[121, 121]

jacksund commented 2 years ago

I started a separate issue for the topic of subflows. That way we can limit this issue to just symmetry and the tolerances used.

I think everything is done for the symmetry issue, but I'm leaving this open through the next release -- so that we can test things more and be sure of our defaults.