KeyError when setting precursors

materialsproject / reaction-network

Reaction Network is a Python package for predicting likely inorganic chemical reaction pathways using graph theoretical methods. Project led by @mattmcdermott (formerly at Berkeley Lab).

https://materialsproject.github.io/reaction-network/

Other

81 stars 16 forks source link

KeyError when setting precursors #248

Closed ancarnevali closed 1 year ago

ancarnevali commented 1 year ago

Hi! I am trying to explore a synthesis network based on Ti, following this notebook as guideline.

I am encountering an error in the set_precursors step. While it seems to work properly for some formulas, it i snot able to add TiO2 as a precursor. Here is the traceback:

KeyError                                  Traceback (most recent call last)
c:... in line 2
      80 #%%
----> 81 rn.set_precursors(["TiO2"])

File c:...\miniconda3\envs\synth\lib\site-packages\rxn_network\network\network.py:159, in ReactionNetwork.set_precursors(self, precursors)
    156 if not g:
    157     raise ValueError("Must call build() before setting precursors!")
--> 159 precursors = {
    160     p
    161     if isinstance(p, (Entry, ExperimentalReferenceEntry))
    162     else self.entries.get_min_entry_by_formula(p)
    163     for p in precursors
    164 }
    166 if precursors == self.precursors:
    167     return

File c:...\miniconda3\envs\synth\lib\site-packages\rxn_network\network\network.py:162, in (.0)
    156 if not g:
    157     raise ValueError("Must call build() before setting precursors!")
    159 precursors = {
    160     p
    161     if isinstance(p, (Entry, ExperimentalReferenceEntry))
--> 162     else self.entries.get_min_entry_by_formula(p)
...
    214         Ground state computed entry object.
    215     """
--> 216     return self.min_entries_by_formula[Composition(formula).reduced_formula]

KeyError: 'TiO2'

It seems strange to me that no entries have formula TiO2, am I missing something? What is going on?

Thank you!

EDIT: The MP website reports plenty of TiO2-labelled calculations, many of which under the energy above the hull I am passing to filter_by_stability.

mattmcdermott commented 1 year ago

Hi @ancarnevali, funny enough, I have run into this problem many times myself!

It does have to do with the energy above hull cutoff, and particularly, weirdness with TiO2. It turns out that if you have the flag include_nist_data=True (which is True by default), the experimental entry for TiO2 is much higher in formation energy than some of the MP entries.

Plus, if you are using GibbsEntrySet to create the objects, TiO2 will be greatly destabilized at high temperature. Combine both of these things, and you'll see that TiO2 has a wildly high energy above hull sometimes (It can be on the order of like 0.12 eV/atom!).

I already addressed this in the development branch by adding an option to ignore certain experimental data for solids (it's better really for liquids/gases). But you can also fix it in the meantime by either using a lower temperature or disabling nist data entirely. It may also be better when using the newest mp-api to access the mixed R2SCAN data. Hope this helps.

mattmcdermott commented 1 year ago

^ just note if you use new R2SCAN data, there may be some weirdness with how GibbsEntrySet works; it was not designed with the new mixing scheme in mind. It turns out that the mixing scheme can differ when applied to chemical subsystems. So just a word of caution!

ancarnevali commented 1 year ago

Thank you! In order to verify this I'd calculate the energies above the hull of some single entries (like TiO2) at different temperatures, but I'm finding some issues. I run the Gibbs energy calculation with GibbsEntrySet.from_computed_entries(entries), but am not sure how to call the .get_e_above_hull method. Moreover, should I explicitly use all the available entries for a given chemical system?

ancarnevali commented 1 year ago

Hi! I tried giving a look to this problem again, and am finding something unexpected (at least for me). I am manually retrieving rutile-type TiO2 MP entries, together with the usual chemsys based query.

with MPRester(key) as mpr:
    entries = mpr.get_entries_in_chemsys("Ti-Mn-O-C-Li")
    rutile_entries = mpr.get_entry_by_material_id("mp-2657", compatible_only=True)
# rutile_entries contains two GGA calculations and two R2SCAN ones for the same material
# the R2SCAN calculations seem to coincide, so I remove one of them by 
del rutile_entry[1]

Flag them by a new data key "is_rutile" and merge.

rutile_entries = entry.data["is_rutile"] = True for entry in rutile_entries]
entries = [entry.data["is_rutile"] = False for entry in entries]
complete_entries = entries + rutile_entry

Run the deltaGf prediction and calculate their energies above the hull (not sure whether the syntax is right, but it seems to do the job).

temp = 1000  # Kelvin
gibbs_entries = GibbsEntrySet.from_computed_entries(
    complete_entries, temp, include_nist_data=False
)
entry_set_hull = GibbsEntrySet(gibbs_entries, calculate_e_above_hulls=True)

and finally check on the rutile entries e_above_hull values:

for entry in entry_set_hull:
    if entry.data["is_rutile"] == True:
        print(entry.composition, entry.data["e_above_hull"])

that returns

Ti4 O8 0.0 #R2SCAN
Ti2 O4 0.023814034746950874 #GGA
# seemingly OK values - haven't cross-checked with an experimental phase diagram tho

It seems like the deltaG predictor is working ok here. Also, cross-checking material IDs and e_above_hull at temp = 1000K

for entry in entry_set_hull:
    if entry.composition == Composition("TiO2"):
        print(entry.composition, entry.data["e_above_hull"], entry.data["material_id"])
    if entry.data["is_rutile"] == True:
        print(entry.composition, entry.data["e_above_hull"])

returns as follows:

Ti1 O2 2.2898657620975946 mp-25433
Ti4 O8 0.0 # rutile
Ti2 O4 0.023814034746950874 # rutile
Ti1 O2 2.482554720614216 mp-1008677
Ti1 O2 2.2257351639837477 mp-1426806

It seems like the rutile ones are behaving OK even at T = 1000 K, unlike the other polymorphs.

Not sure what is going on here, nor why the NIST data about TiO2 (most likely rutile, although I haven't checked) behaved weirdly and its e_above_hull went so high with increasing T. But using specific entries that correspond to well-known phases might be a solution to our TiO2 issue perhaps?

mattmcdermott commented 1 year ago

Hi @ancarnevelli, sorry to hear this is still behaving weirdly for you.

Regarding this line:

rutile_entries = mpr.get_entry_by_material_id("mp-2657", compatible_only=True)

We need to be careful here not to mix R2SCAN and GGA entries without additional corrections. By default, entries acquired this way from Materials Project do not feature any mixing scheme correction.

To get the entries with mixing scheme corrections applied, do this:

with MPRester() as mpr:
    ents = mpr.get_entries_in_chemsys("Ti-Mn-O-C-Li", additional_criteria={"thermo_types":['GGA_GGA+U_R2SCAN']})

However, I strongly recommend not using mixed GGA and R2SCAN data with the reaction-network package due to how the mixing scheme is designed. Long story short, the scheme is not consistently applied for every entry (there is no specific correction value that every entry will always have). It depends on the chemical system one is looking at. This breaks some of my code.

With regards to TiO2 in particular, I did fix this issue a while back by introducing a new flag in GibbsEntrySet called ignore_nist_solids. The idea is to prefer DFT data for compounds with high melting points. Put simply, we only want to use NIST data when we think that the high-temperature effects will dramatically decrease the free energy of the phase (e.g., it melts or sublimates). This addition fixes the problem with TiO2 and some other entries that are similar.

I am going to release the new version of the package very soon (likey this week), but if you want to use the new code in the meantime, check out the dev branch.

ancarnevali commented 1 year ago

Thank you for your help, using the dev code worked out! I'll close the issue with this.

I'm not sure whether this is the right place to ask this, so please feel free to redirect me, but I was wondering: why does the pathfinding algo seem to identify intermediate reactions with deltaG>0 even though you pass intermediate_rxn_energy_cutoff=0.0 in ps.solve (talking about the 2_networks.ipynb, entries [21] and [22])?

Also, what kind of implications does this have on the feasibility of a path that passes through a deltaG>0 reaction? I'd assume that such a reaction path would be impossible, but I might be misinterpreting the dG value that is printed.

Thank you again for your detailed help!

mattmcdermott commented 1 year ago

@ancarnevali Great! FYI, in the last few days, I pulled the code from the dev branch into the main branch and released a new package version.

The pathway solver seeks to do the opposite: identify all intermediate reactions with dG < 0 (or more generally, with energies below the user-provided cutoff, which could be higher too, such as +0.01 or +0.1 eV/atom). Hopefully, I stated this clearly in the documentation!

In theory, it is forbidden for an endergonic (dG > 0) reaction to occur. In reality, there is uncertainty to our energy predictions and we can't always accurately discriminate between positive and negative. This particularly applies to reactions with energies near zero (dG = +0.001 eV/atom), or reactions that produce higher entropy phases (e.g., gases like CO2). For these phases, we may not have accurate values of their chemical potentials (e.g., partial pressure of CO2).

Thus it tends to be a good idea to keep endergonic reactions around in case they're involved in an important pathway -- this is one of the justifying reasons for using a cost function approach (the softplus doesn't necessarily discriminate between negative and positive dG except to scale them differently).

ancarnevali commented 1 year ago

I see, that makes sense! Coming back to the runtype of the data one provides to GibbsEntrySet: you mentioned that the mixing scheme on GGA calculations is to be avoided (is this marked as GGA+U?), together with the R2SCAN calculations.

I resorted to quering as:

with MPRester() as mpr:
    entries = mpr.get_entries_in_chemsys(
        "Y-Mn-O-Li-Cl-C", additional_criteria={"thermo_types": ["GGA"]}
    )

but "GGA" is not accepted as a thermo type. I am a bit confused about how to query only reaction-network-compatible data, and how to manually add specific entries by material ID, that too are rxn-net-compatible. Also, could GGA calculations external to MP be added to an entry list to pass to GibbsEntrySet?

Lastly, would the following make sense or is there a difference I am overlooking between 'thermo_types' and "run_type"?

with MPRester() as mpr:
    entries= mpr.get_entries_in_chemsys("Y-Mn-O-Li-Cl-C")
gga_entries = []
for entry in entries:
    if entry.data["run_type"] == "GGA":
        gga_entries.append(entry)

Thank you!

mattmcdermott commented 1 year ago

By default, the MPRester.get_entries_in_chemsys method returns only compatible GGA/GGA+U data. No need to specify additional criteria. This is all you need in general for almost every application:

chemsys = "..."  # e.g., Li-Mn-O
with MPRester() as mpr:
    entries = mpr.get_entries_in_chemsys(chemsys)

The mixing scheme to avoid for the reaction network calculations is not the GGA/GGA+U scheme (that one is the standard), but rather R2SCAN mixed with GGA/GGA+U. The full GGA/GGA+U/R2SCAN mixing scheme seems to work for phase diagrams, but some of the assumptions in this code lead to weird results.

You can add your own entries by either 1) running new DFT calculations with Materials Project compatible settings (see atomate) and mixing those with the downloaded MP entries, or 2) creating ComputedEntry objects in pymatgen where the energies are Gibbs free energies of formation ΔGf. Note that in this second approach, the entries must only be added to GibbsComputedEntry objects created with GibbsEntrySet.from_computed_entries (this ensures that all entries use the Gibbs free energies of formation rather than energy output from DFT, which is on its own scale).

Lastly, I wouldn't recommend filtering by run_type, although it should theoretically work. If you do this, ensure you also allow GGA+U (that mixing is fine).