Open scott-materials opened 2 years ago
Thanks! I agree - it'd be nice to have more high-symmetry structures in the results. Right now, all transformations look to reduce the symmetry, and I've been thinking about a SymmetryReduction
transformation. It would take structures and run them through spglib
at increasing cutoff values until we get a new high-sym structure(s). This would be really easy to add.
Because we generally assume the symmetric structure is the ground state, this additional step might better zero in on the ground state.
I might misunderstand what you're saying here, but I'm not sure this is true. I think lower symmetry structures will always be better (in terms of energy and search progress) -- they're just not as pretty of an output. They'll have equal or lower energy compared to high-symmetry equivalents because there are more degrees of freedom for the atoms/forces/lattice to relax. The extra degrees of freedom will also better enable the structure to 'jump' to another local minima that is better, which will help the search. We'd have to do some advanced benchmarking to check this though.
I would imagine that the relaxation and static calculation would be performed at a slightly higher quality than the highest quality relaxation (in terms of EDIFF or EDIFFG) in an effort to get a cell with the lowest energy possible.
Allowing separate workflows for each creator/transformer would be difficult to implement. However, we could instead run a unique workflow after the search completes -- i.e. once the search ends, take the top N structures and submit them to a "clean-up" workflow that enforces high symmetry and high quality relaxation/energy calcs. This sounds in-line with your "additional benefit" comment.
the calculation is prolonged each time that a minimally improved structure appears
I think an improved StopCondition
subclass is the solution to this. One that inspects the history of best structures and checks for "how much" structures are improving. We could have advanced logic for stopping the search through this. For example, if the structure fingerprints are within X distance and energies are within Y eV, then treat these as the same structure when determining the stop condition.
To-do items for this issue:
pre_symmetrize_structure
kwarg to workflowsSymmetryReduction
transformation (and make steadystate_source)StopCondition
that accounts for insignificant/minor improvements in best_structure
I might misunderstand what you're saying here, but I'm not sure this is true. I think lower symmetry structures will always be better (in terms of energy and search progress) -- they're just not as pretty of an output. They'll have equal or lower energy compared to high-symmetry equivalents because there are more degrees of freedom for the atoms/forces/lattice to relax. The extra degrees of freedom will also better enable the structure to 'jump' to another local minima that is better, which will help the search. We'd have to do some advanced benchmarking to check this though.
I think we're talking about different situations. You're correct that allowing relaxation without symmetry constraints will always give the correct local minimum. But I'm talking about those times when the structure is relaxing towards a high symmetry structure from a low symmetry starting point. This happens quite often. For example, in my most recent test, I ended up with these best structures:
| | id | energy_per_atom | updated_at |
| 0 | 2262 | -7.28851 | 2022-08-19 03:43:32 |
| 1 | 1379 | -7.28785 | 2022-08-18 22:37:56 |
| 2 | 1361 | -7.28777 | 2022-08-18 22:35:10 |
| 3 | 1219 | -7.28773 | 2022-08-18 22:05:11 |
Structures 0-3 and basically identical, +/- a tenth of Angstrom or +/- a few tenths of a degree. If run id had been symmetrized and relaxed with higher convergence settings, it would have immediately given the correct ground state. Instead, it stopped at 'almost' the ground state, which then let three more randomly initiated structures to eventually beat 1219. Because run id 2262 beat 1219, we had to churn through an additional 1000 calculations before hitting the stop condition.
In my experience, this is a pretty common experience. A typical run ends up with ~5 identical best structures. Obviously a waste of calculation effort.
I would imagine that the relaxation and static calculation would be performed at a slightly higher quality than the highest quality relaxation (in terms of EDIFF or EDIFFG) in an effort to get a cell with the lowest energy possible.
Allowing separate workflows for each creator/transformer would be difficult to implement. However, we could instead run a unique workflow after the search completes -- i.e. once the search ends, take the top N structures and submit them to a "clean-up" workflow that enforces high symmetry and high quality relaxation/energy calcs. This sounds in-line with your "additional benefit" comment.
Running after would give us nice unit cells but wouldn't lead to a speed up in the calculation. Maybe this could be implemented as part of the default workflow: relax_0, relax_1, relax_2, relax_3, static_1, E=lowest?, symmetrize, relax_4, static_2, save the lowest of static_1 or static_2.
Or it could be implemented as a separate worker task. E.g.:
workflow_name: structure-prediction.python.symmetrize
This would just be launched independently / simultaneously and monitor the database for the best structures of the same composition that haven't gone through the additional symmetry/relax/static/check process.
the calculation is prolonged each time that a minimally improved structure appears
I think an improved
StopCondition
subclass is the solution to this. One that inspects the history of best structures and checks for "how much" structures are improving. We could have advanced logic for stopping the search through this. For example, if the structure fingerprints are within X distance and energies are within Y eV, then treat these as the same structure when determining the stop condition.
Yes, or this. The combination of energy change + fingerprint would do it.
I do like the idea of doing a relaxation and static after the symmetry transformation. It is possible that the symmetry operation makes it worse, in which case the transformation should be rejected and possibly rerun with stricter distance tolerances.
relax_0, relax_1, relax_2, relax_3, static_1, E=lowest?, symmetrize, relax_4, static_2, save the lowest of static_1 or static_2
I can do this. This would just move the idea of a subworkflow_cleanup
to into the NewIndividual
workflow. I'm guessing the idea of a workflow series (i.e. several workflows with validation between each) will become increasingly important as we optimize the search. On top of the "E=lowest?, symmetrize", I think the most useful step would be to check the validator again. By the time we've completed relax3, the structure will closely resemble the final structure and our fingerprint/structure matcher will be much better at catching duplicates. In theory it'd be a simple loop like this:
subflows = [
"relaxation.vasp.staged",
"relaxation.vasp.mit",
"static-energy.vasp.mit",
]
for subflow in subflows:
result = subflow.run(...)
check = validator.check(result) # gives True if it passes the check
if not check:
break # exit the for-loop and skip the remaining flows
In practice, this would complicate how we read individuals from the database. I won't know until I try implementing it.
structure-prediction.python.symmetrize
A separate symmetrize workflow might not be necessary. All relaxations already support pre_sanitize_structure
and pre_standardize_structure
parameters -- and the later symmetrizes the structure using a cutoff matching the SYMPREC
setting. It's part of the base VaspWorkflow
class that all inherit from:
I use this parameter a bunch, but wasn't sure if anyone in our lab was aware of it.
I can always add a pre_symmetrize_structure
parameter too
There's also the chance we're over complicating this. By default, VASP uses a SYMPREC
of 1e-5
and I keep this default throughout the relaxation.vasp.staged
. Perhaps increasing this value to something like 1e-2
for final stages will help. Other evolutionary searches have gone down this rabbit hole too, so we need to be careful. Straight from the USPEX manual they say the following:
To get full space group, either increase symmetry tolerances (but this can be dangerous), or re-relax your structure with increased precision.
EDIT: here's the full quote
Caution: When looking at space groups in the file Individuals, keep in mind that USPEX often underdetermines space group symmetries, because of finite precision of structure relaxation and relatively tight space group determination tolerances. You should visualize the predicted structures. To get full space group, either increase symmetry tolerances (but this can be dangerous), or re-relax your structure with increased precision.
Interesting suggestions. Some thoughts:
(1) SYMPREC
won't shift the unit cell origin (or unit cell shape), whereas spglib
would. So the spglib transformation would still be necessary.
(2) I think the USPEX quote is only partially correct. I agree that you can (1) increase symmetry tolerances, or (2) re-relax with increased precision. But there's a third, better option. Symmetrize, relax, static, and compare energies. This removes the danger of (1) while avoiding the full computational cost of (2) since the symmetrized calculation is clearly much faster, even if performed at higher tolerances. I think it's this third option that I'm after.
(3) I like the idea that most structures that are registered in our database are properly symmetrized. If these are symmetrized, it allows far more efficient searching of the database... e.g., for extracting periodic trends. But no one should really care about those structures very far from the Hull. So perhaps instead of E=lowest? this becomes E_above_hull < 0.25 eV/atom to determine whether the later stages are engaged.
Awesome, these are some good ideas to get me started. I imagine it will take a full day to organize our thoughts and test out different suggestions. I'll work on this tomorrow and post an update.
I added a convergence_limit
parameter for the stop conditions. It's used alongside the limit_best_survival
parameter. It will absorb minor changes in energy with equivalent structures -- and prevent prolonging a search. I set the default value to 1meV/atom, which fixes all the cases I've tested so far.
As for the symmetry, I merged pre_standardize_structure
and pre_sanitize_structure
functionality into to a single standardize_structure
parameter that accepts different modes. Then I added symmetry_tolerance
and angle_tolerance
parameters to let us tune how the standardization is done at each relaxation stage. So for relaxation.vasp.staged
workflow, we could benchmark what the best values are. Currently, I'm using a 0.1A + 5deg tolerances past relaxation 3.
For "extreme" symmetry reduction, I'm leaving it to a new transformation that uses large tolerances -- as much as 1.5A + 25deg. I think the more atoms in a composition, the more important this transformation will be. In my quick testing, anything >0.1A + >5deg is prone to significantly changing the structure -- and therefore I'd like to keep these as transformations.
I also increased the tolerances used when registering a structure to the database (from 0.01A to 0.1A). Even if there is slight disorder in the structure, the database column for spacegroup will be more lenient and help analysis.
But there's a third, better option. Symmetrize, relax, static, and compare energies. ...... But no one should really care about those structures very far from the Hull. So perhaps instead of E=lowest? this becomes E_above_hull < 0.25 eV/atom to determine whether the later stages are engaged.
This is still out of reach. I still plan to do with a list of subworkflows like I described before. That will take more time.
Very cool!
I've added some comments in the pull request. Recapping here: after relaxation, a tolerance of 5 degrees is quite large. From my data, it should be maximum 1 degree and probably better at a maximum 0.5 degrees.
In my testing, I used spglib
directly, e.g.,
from spglib import get_spacegroup,standardize_cell
standardize_cell(cell, to_primitive=True, no_idealize=False, symprec=2e-3, angle_tolerance=0.01)
I used as input Simmate output on a 15-atom cell. My very worst cell could still transform to the correct symmetry with the above settings. A lower symprec
or angle_tolerance
prevented the transformation. Although my findings are limited, I'd say that the symprec
of 0.01 Angstroms and angle_tolerance
of 0.5 degrees should be preferred.
Huh... Your arrival at 0.01A is actually surprising for several reasons. Before I pull that thread, I do want to say that the default is in fact 0.01Ang: https://github.com/jacksund/simmate/blob/3b54ea9a4264a955edbfb66cd6e968bb0becdb26/src/simmate/calculators/vasp/workflows/base.py#L147
But for why 0.01A is surprising:
0.01Ang + 5.0deg is the default for v0.9.0. That means this issue was opened with the premise that these tolerances were too loose, but then your data is suggesting that even stricter tolerances are optimal. It therefore looks like you debunked the original issue. Just to show that this was indeed the default before:
pymatgen actually recommends 0.1A for structures relaxed with DFT (shown here)
For structures with slight deviations from their proper atomic positions (e.g., structures relaxed with electronic structure codes), a looser tolerance of 0.1 (the value used in Materials Project) is often needed.
RandomSymStructure
(no transformations)I'm looking at a slightly different case, which is probably why we are arriving at different conclusions
My results with 0.01A show that the spacegroup isn't being determined properly (spacegroup are all 1 or 2): | rank | id | energy_per_atom | updated_at | source | spacegroup | parent_ids |
---|---|---|---|---|---|---|---|
0 | 493 | -4.82264 | 2022-08-21 20:12:31 | from_ase.CoordinatePerturbation | 1 | 334 | |
1 | 193 | -4.82252 | 2022-08-21 19:29:23 | from_ase.MirrorMutation | 1 | 85 | |
2 | 205 | -4.82244 | 2022-08-21 19:31:17 | from_ase.LatticeStrain | 2 | 85 | |
3 | 432 | -4.82242 | 2022-08-21 19:59:04 | from_ase.MirrorMutation | 1 | 205 | |
4 | 484 | -4.82238 | 2022-08-21 20:11:37 | from_ase.CoordinatePerturbation | 1 | 205 | |
5 | 367 | -4.82236 | 2022-08-21 19:49:58 | from_ase.Heredity | 1 | [193, 205] | |
6 | 334 | -4.82222 | 2022-08-21 19:45:57 | from_ase.Heredity | 1 | [85, 217] | |
7 | 355 | -4.82198 | 2022-08-21 19:48:46 | from_ase.Heredity | 1 | [331, 196] |
But 0.1A works much better with transformed structures (gives higher spacegroups): | rank | id | energy_per_atom | updated_at | source | spacegroup | parent_ids |
---|---|---|---|---|---|---|---|
0 | 207 | -4.82265 | 2022-08-21 23:34:03 | from_ase.LatticeStrain | 12 | 171 | |
1 | 127 | -4.82185 | 2022-08-21 23:25:30 | from_ase.Heredity | 12 | [37, 102] | |
2 | 171 | -4.82045 | 2022-08-21 23:28:14 | from_ase.SoftMutation | 12 | 144 | |
3 | 144 | -4.82014 | 2022-08-21 23:26:44 | from_ase.Heredity | 164 | [121, 102] | |
4 | 3 | -4.82005 | 2022-08-21 23:13:56 | RandomSymStructure | 164 | ||
5 | 192 | -4.81977 | 2022-08-21 23:31:32 | from_ase.MirrorMutation | 164 | 121 | |
6 | 121 | -4.81971 | 2022-08-21 23:24:26 | from_ase.Heredity | 164 | [63, 3] | |
7 | 102 | -4.8197 | 2022-08-21 23:20:54 | from_ase.Heredity | 164 | [3, 54] | |
8 | 163 | -4.81969 | 2022-08-21 23:27:55 | from_ase.Heredity | 12 | [121, 121] |
I started a separate issue for the topic of subflows. That way we can limit this issue to just symmetry and the tolerances used.
I think everything is done for the symmetry issue, but I'm leaving this open through the next release -- so that we can test things more and be sure of our defaults.
Describe the desired feature
I'd like the best structures in an evolutionary search to be symmetrized. For example, if a search produces a cell with 119.8 degree angles and virtually identical lattice vectors, we know it should be symmetrized and relaxed to see if the new cell is better. Presumably, a generic symmetry operation with spglib and a 0.1 Angstrom (?) displacement would do the job. Because we generally assume the symmetric structure is the ground state, this additional step might better zero in on the ground state. I would imagine that the relaxation and static calculation would be performed at a slightly higher quality than the highest quality relaxation (in terms of EDIFF or EDIFFG) in an effort to get a cell with the lowest energy possible.
Although this adds 'unnecessary' calculations, the effect would be to shorten the search by quickly transforming a good structure into the best structure. After all, the calculation is prolonged each time that a minimally improved structure appears (e.g., <1 meV better) even if the structure is essentially identical. My thought is that this calculation should be run on the 2? best structures in the search after 15 structures have been calculated.
An additional benefit -- the final output structures would be fully symmetrized, which would be nice for subsequent usage.
Current behavior and/or alternatives
I can symmetrize manually and do a subsequent geometry optimization, etc., and manually compare this structure to the best structure of the search. This should be automated.
Example Use
No response
Additional context
No response