janosh / matbench-discovery

An evaluation framework for machine learning models simulating high-throughput materials discovery.
https://matbench-discovery.materialsproject.org
MIT License
110 stars 18 forks source link

`data/wbm/compile_wbm_test_set.py`: various errors #121

Closed pbenner closed 3 months ago

pbenner commented 3 months ago

I've attached a patch that fixes most issues for me. However, two asserts fail:

assert n_corrected == 100_930, f"{n_corrected=} expected 100,930"
assert df_summary.e_correction_per_atom_mp2020.mean().round(4) == -0.1069

Also I get:

Traceback (most recent call last):
  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/analysis/phase_diagram.py", line 1776, in get_decomposition
    pd = self.get_pd_for_entry(comp)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/analysis/phase_diagram.py", line 1763, in get_pd_for_entry
    raise ValueError(f"No suitable PhaseDiagrams found for {entry}.")
ValueError: No suitable PhaseDiagrams found for Ac6 U2.

And:

  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/analysis/phase_diagram.py", line 1782, in get_decomposition
    return _get_slsqp_decomp(comp, competing_entries)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/analysis/phase_diagram.py", line 2088, in _get_slsqp_decomp
    Es = np.array([comp_entry.energy_per_atom for comp_entry in competing_entries])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/analysis/phase_diagram.py", line 2088, in <listcomp>
    Es = np.array([comp_entry.energy_per_atom for comp_entry in competing_entries])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/entries/__init__.py", line 87, in energy_per_atom
    return self.energy / self.composition.num_atoms
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pbenner/.local/opt/anaconda3/envs/mace/lib/python3.11/site-packages/pymatgen/core/composition.py", line 497, in num_atoms
    return self._n_atoms
           ^^^^^^^^^^^^^
AttributeError: 'Composition' object has no attribute '_n_atoms'. Did you mean: '_natoms'?
> pip list | grep pymatgen
pymatgen                  2024.7.18
janosh commented 3 months ago

thanks for reporting @pbenner. that script is surprisingly high-entropy, stuff keeps breaking. that _n_atoms error in particular is strange. i'll rerun your patch in a bit to see which errors i can reproduce.

janosh commented 3 months ago

@pbenner have a look at 20c752a and a2d7add in https://github.com/janosh/matbench-discovery/pull/122. those should fix both PatchedPhaseDiagram errors you encountered.

re the formerly 100,930 WBM computed entries of which only 99k now receive MP2020 corrections, I added this comment which might explain the discrepancy but i'm not sure. maybe @mkhorton can comment if there could be any edge cases where strict_anions="no_check" wouldn't result in the previous MP2020 behavior https://github.com/materialsproject/pymatgen/pull/3803

https://github.com/janosh/matbench-discovery/blob/a2d7add73fe027b59ae6c6d8b10bae131fe204d3/data/wbm/compile_wbm_test_set.py#L548-L555