materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.48k stars 855 forks source link

Bug in processing entries of GGA/GGA+U/R2SCAN scheme #3113

Open peikai opened 1 year ago

peikai commented 1 year ago

I found that the entry mp-1181334-GGA+U for Cu-Fe-Li-O-Te chemical system can wrongly pass the mixing process of GGA/GGA+U/R2SCAN scheme. What is weird is that this entry disappears and reappears in the processed entries result occasionally, when the mixing process is conducted multiple times -- about every third run in my test -- although the codes below been tested in the latest pymatgen package.

from pymatgen.ext.matproj import MPRester
from pymatgen.entries.mixing_scheme import MaterialsProjectDFTMixingScheme

with MPRester() as mpr:
    entryList = mpr.get_entries_in_chemsys('Cu-Fe-Li-O-Te', additional_criteria={'thermo_types':['GGA_GGA+U', 'R2SCAN']})
    entryList = MaterialsProjectDFTMixingScheme().process_entries(entryList, verbose=False)

    print(len(entryList))
    print('mp-1181334-GGA+U' in [e.entry_id for e in entryList])

There may be either 637 or 638 entries in the processed entryList. The latter case would contain an entry mp-1181334-GGA+U, which actually contains no energy adjustment related to the GGA/GGA+U/R2SCAN scheme, as shown in the exported JSON file below.

mp-1181334-GGA+U ComputedStructureEntry - Fe4 O13      (Fe4O13)
Energy (Uncorrected)     = -92.6898  eV (-5.4523  eV/atom)
Correction               = -11.1170  eV (-0.6539  eV/atom)
Energy (Final)           = -103.8068 eV (-6.1063  eV/atom)
Energy Adjustments:
  MP2020 anion correction (superoxide): -2.0930   eV (-0.1231  eV/atom)
  MP2020 GGA/GGA+U mixing correction (Fe): -9.0240   eV (-0.5308  eV/atom)
Parameters:
  potcar_spec            = [{'titel': 'PAW_PBE Fe_pv 06Sep2000', 'hash': '5963411539032ec3298fa675a32c5e64'}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': '7a25bc5b9a5393f46600a4939d357982'}]
  is_hubbard             = True
  hubbards               = {'Fe': 5.3, 'O': 0.0}
  run_type               = GGA+U
Data:
  oxide_type             = superoxide
  aspherical             = True
  last_updated           = 2021-02-05 09:26:25.316000
  task_id                = mp-1829691
  material_id            = mp-1181334
  oxidation_states       = {}
  run_type               = GGA+U

The potential bug needs to be found, as the reliability of local processing algorithms has become critical for the generation of GGA/GGA+U/R2SCAN phase diagrams^note.

janosh commented 1 year ago

I can't repro this. Tried about 7 times. I get 638 entries every time.

which actually contains no energy adjustment related to the GGA/GGA+U/R2SCAN scheme

I'm not very familiar with our r2SCAN/GGA mixing scheme but I think what might be happening here is that we place r2SCAN energies onto the GGA(+U) hull? If so, you wouldn't expect an adjustment on a GGA+U entry? Pinging @rkingsbury @munrojm to correct me in case I'm wrong.

rkingsbury commented 1 year ago

I'm not very familiar with our r2SCAN/GGA mixing scheme but I think what might be happening here is that we place r2SCAN energies onto the GGA(+U) hull? If so, you woudln't expect an adjustment on a GGA+U entry? Pinging @rkingsbury @munrojm to correct me in case I'm wrong here.

That's correct, some GGA/GGA+U entries will pass through the mixing scheme unmodified, if the hull is still being built with GGA/GGA+U entries. That would occur any time there are insufficient R2SCAN entries to build the hull. In such a high dimensional chemical system as this, I'm almost certain the hull would be built in GGA/GGA+U, so getting the entry back unmodified is exactly what should happen. Please refer to the mixing scheme publication for a more thorough explanation.

As for the non-reproducibility / I'm not sure. The mixing scheme does rely on structure matching, so perhaps due to some quirk in your computer system the matching occasionally fails to consider two structures "the same"? That's a stretch but it's the best thought I have.

Also, I note that you're using the old MPRester (pymatgen.ext.matproj). If you want to use r2SCAN data I think you would be better off using the rester from the mp-api package, but @munrojm would know better.

peikai commented 1 year ago

Also, I note that you're using the old MPRester (pymatgen.ext.matproj). If you want to use r2SCAN data I think you would be better off using the rester from the mp-api package, but @munrojm would know better.

@rkingsbury Oh, it is a typo here. Actually in my test codes, the MPRester method was imported by from mp_api.client import MPRester.

I tested it on a different computer and network, but the inconsistency issue still happened. I will try to run them with a fresh environment in a container next.

peikai commented 1 year ago

The same issue happened, even though I ran the codes in a fresh set-up container of python3.10-alpine. I think this is a rare case (the chemical system and entry) that induces this issue whereas would not happen in other chemical systems, but the potential bug may still result in entry loss or excess, thereby possibly changing the shape of the convex hull.

janosh commented 1 year ago

Can you pinpoint the bug?

peikai commented 1 year ago

@janosh Yes, it is the the structure_matcher.group_structures() methods in mixing_scheme.py 564L, that results in inconsistency when matching the structures of the entry mp-1181411-GGA+U Fe4O13 with the entry mp-1181334-GGA+U Fe4O13. I found that the codes sometimes match the two entries together in one group, whereas sometimes divide them into two individual groups. In the latter case, there would be 638 processed entries, otherwise, there would be 637 entries.

rkingsbury commented 1 year ago

@janosh Yes, it is the the structure_matcher.group_structures() methods in mixing_scheme.py 564L, that results in inconsistency when matching the structures of the entry mp-1181411-GGA+U Fe4O13 with the entry mp-1181334-GGA+U Fe4O13. I found that the codes sometimes match the two entries together in one group, whereas sometimes divide them into two individual groups. In the latter case, there would be 638 processed entries, otherwise, there would be 637 entries.

@peikai it sounds to me like you are running into an edge case related to the numerical tolerances of StructureMatcher, where tiny numerical noise is causing those structures to sometimes match and sometimes not. I suggest 2 things:

  1. Can you reproduce the problem outside of the mixing scheme, by just running StructureMatcher on your list of input structures?
  2. If yes, then try adjusting the tolerance kwargs of StructureMatcher. If you slightly change the ltol, stol, or angle_tol, you can hopefully get reproducible results.