materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.49k stars 857 forks source link

Parsing cp2k output fails after v2024.8.9 release. #4079

Open msiron-entalpic opened 4 days ago

msiron-entalpic commented 4 days ago

Python version

3.12.5

Pymatgen version

v2024.9.17.1

Operating system version

MacOSX

Current behavior

When parsing cp2k output using

from pymatgen.io.cp2k.outputs import Cp2kOutput
o = Cp2kOutput('cp2k.out', auto_load=True)

With any version of Pymatgen after v2024.8.9 stacktrace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 o = Cp2kOutput('cp2k.out', auto_load=True)

File [~/dev/pymatgen/src/pymatgen/io/cp2k/outputs.py:78](http://localhost:8888/lab/tree/job_2024-09-25-08-56-39-543707-66393/~/dev/pymatgen/src/pymatgen/io/cp2k/outputs.py#line=77), in Cp2kOutput.__init__(self, filename, verbose, auto_load)
     75 self.ran_successfully()
     76 self.convergence()
---> 78 self.parse_structures()
     79 self.parse_energies()
     80 self.parse_forces()

File [~/dev/pymatgen/src/pymatgen/io/cp2k/outputs.py:300](http://localhost:8888/lab/tree/job_2024-09-25-08-56-39-543707-66393/~/dev/pymatgen/src/pymatgen/io/cp2k/outputs.py#line=299), in Cp2kOutput.parse_structures(self, trajectory_file, lattice_file)
    298 gs = self.initial_structure.site_properties.get("ghost")
    299 if not self.is_molecule:
--> 300     for mol, latt in zip(mols, lattices, strict=True):
    301         self.structures.append(
    302             Structure(
    303                 lattice=latt,
   (...)
    309             )
    310         )
    311 else:

ValueError: zip() argument 2 is shorter than argument 1

After testing the lattices variable has one less entry than the mols variable. Adding one more entry to lat or removing first entry from mols leads to errors downstream.

Expected Behavior

Expect to be able to parse output without errors with auto_load as in v2024.8.9

Minimal example

from pymatgen.io.cp2k.outputs import Cp2kOutput
o = Cp2kOutput('cp2k.out', auto_load=True)

Relevant files to reproduce this bug

Archive 2.zip

DanielYang59 commented 4 days ago

Hi @msiron-entalpic thanks for reporting this, I could confirm this issue.

The reason being we added strict=True to ensure the two iterators (mols and lattices) to unpack are of the same length: https://github.com/materialsproject/pymatgen/blob/e9ea813107ea223d5ce693bdf5e4f74fe29f76db/src/pymatgen/io/cp2k/outputs.py#L300


I'm not a CP2K user, so I would need your help to confirm this (also ping the original author @nwinner ):


Also the total_energy entry is longer than structures by 1 (SCF run at the start I assume?): https://github.com/materialsproject/pymatgen/blob/e9ea813107ea223d5ce693bdf5e4f74fe29f76db/src/pymatgen/io/cp2k/outputs.py#L523

If so, should we skip the first energy to make the energy-structure match each other (zip(self.structures, self.data.get("total_energy")[1:], strict=True))?


Is it possible for you to provide a unsuccessful run such that we could enhance the unit test for unsuccessful runs as well?