materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.52k stars 864 forks source link

Round trip potcar writing/reading #2267

Open JosephMontoya-TRI opened 3 years ago

JosephMontoya-TRI commented 3 years ago

Describe the bug Round trip Potcar parsing results in a warning that's not present when writing input.

To Reproduce

import unittest
from pymatgen.io.vasp.sets import MPRelaxSet
from pymatgen.io.vasp.inputs import VaspInput
from pymatgen.core.lattice import Lattice
from pymatgen.core.structure import Structure
from pymatgen.core.periodic_table import Element
from monty.tempfile import ScratchDir

class PotcarCompatTest(unittest.TestCase):
    def test_round_trip(self):
        with ScratchDir('.'):
            structure = Structure(Lattice.cubic(1.5), ['C'], [[0, 0, 0,]])
            relax_set = MPRelaxSet(structure)
            # This should complain if md5 hashes are wrong
            print("Write set")
            relax_set.write_input('.')
            print("Read")
            vasp_input = VaspInput.from_directory('.') # This results in a warning

if __name__ == "__main__":
    unittest.main()

Expected behavior No warning if input set is written without a warning.

Desktop (please complete the following information):

Additional context Believe this is related to the regex parsing of PotcarSingle from a multiple potcar file, or perhaps on the write itself, which may leave out a trailing line break present in the original file.

On the write - the PotcarData contains a trailing line break, but not on the read.


test_potcar.py Write set

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /root/miniconda3/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py(1780)__init__()
-> self.hash = self.get_potcar_hash()
(Pdb) self.data[-20:]
'+01\n End of Dataset\n'
(Pdb) c

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB continue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Read

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /root/miniconda3/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py(1780)__init__()
-> self.hash = self.get_potcar_hash()
(Pdb) self.data[-20:]
'E+01\n End of Dataset'
(Pdb)
mkhorton commented 3 years ago

I think the reason for this is simply that the hash database is incomplete.

There are two sets of hashes, a "file" hash (checks the POTCAR is byte-to-byte identical) and a "data" hash (which is a bit more permissive, e.g. if there are changes in comment strings etc.)

The data hash matches, but the file match does not, hence it is in error. The warning message should probably be less extreme, i.e. acknowledge the possibility that the pymatgen database is incomplete.

The warning does not appear on set construction since only the "data" hash is checked.

Suggested remedy here is:

  1. Only check the data hash in both places to avoid inconsistency.
  2. Supplement the hash file with additional hashes.

A separate issue here is that PotcarSingle.identify_potcar() returns (['PBE'], ['runelements', 'C']) for reasons that are not clear to me.

mkhorton commented 3 years ago

Hm, I think the runelements is simply because there was an older VASP POTCAR that identified itself as such (or perhaps a parsing bug).

mkhorton commented 3 years ago

Ok, I mis-spoke previously. I think this is a round-trip issue due to some logic in Potcar.from_file. The file itself is correct.

I have a fix which is to hash the str(PotcarSingle) rather than PotcarSingle.data. The failing file hash is because PotcarSingle.data removes a trailing newline.