lsmo-epfl / EQeq

Charge equilibration method for crystal structures
GNU General Public License v2.0
3 stars 2 forks source link

CIF parsing assumes particular format, breaks for pymatgen-generated CIF files #24

Open johannbrehmer opened 5 days ago

johannbrehmer commented 5 days ago

I believe there is a bug in the parsing of CIF files.

The LoadCIFData() function assumes that atoms are described in lines of the following form:

atom_label  atom_type   [entries that do not involve a dot] x   y   z   [...]

However, the CIF format is extremely generic, and allows several other forms. For instance, CIF files generated by pymatgen can look like this:

...
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  C  C0  1  0.92693230  0.09413810  0.20146000  1
  C  C1  1  0.88283400  0.00951251  0.09697130  1
  C  C2  1  0.87383020  0.05616420  0.08434580  1
  C  C3  1  0.91268250  0.07610070  0.05408850  1
  C  C4  1  0.96251430  0.05028320  0.03666980  1
...

Running EQeq on such a CIF file leads to EQeq taking the atom labels for the types, and the types for the labels. This is visible for instance from its std output:

==================================================
========= Atom types - X & J values used =========
==================================================
Co  Z: 27   Ch. Cent: 2 X: -7.6 J: 16.44    
H2  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
H3  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C0  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C1  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C2  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C3  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C4  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C5  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C6  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C7  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C8  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
C9  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
N3  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
O4  Z: 1    Ch. Cent: 0 X: 7.1761   J: 12.8438  
==================================================
===== Calculating charges... please wait. ========
...

If I'm not mistaken, this affects for instance mofchecker (@kjappelbaum), since that uses pymatgen to store CIF files and then EQeq to analyze the charges.

kjappelbaum commented 5 days ago

seems like a good catch!

kjappelbaum commented 5 days ago

seems like there should be a check or at least a clearer warning about this in the docs

kjappelbaum commented 5 days ago

I think https://github.com/danieleongari/manage_crystal was quite commonly used in our group to "clean" CIF files

johannbrehmer commented 5 days ago

Thanks, I didn't know this one yet!

Plugging a call to manage_crystal into mofchecker (after creating the CIF file here) or pyeqeq (in the beginning of run_on_cif()) should indeed solve my issue. The latter seems like the right thing to do to me.

Would you be interested in me opening a merge request for this?

johannbrehmer commented 5 days ago

I can confirm that including a call to manage_crystal in EQeq fixes the problem.

kjappelbaum commented 3 days ago

A PR for that would be very much appreciated! 🙏🏼