materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.5k stars 861 forks source link

CIF files not parsed correctly #637

Closed paulfons closed 7 years ago

paulfons commented 7 years ago

<When reporting bugs/issues, please supply the following information. If this is a feature request, please simply state the requested feature.>

System

Summary

Example code

A CIF file can specify site occupancy. For example, the structure of the cubic phase of Ge2Sb2Te5 is cubic with one Wyckoff site occupied by Te while the other site is occupied by 40% Sb, 40% Ge, and 20% vacancies. The appropriate bit of the (correct) CIF file for this structure reads

loop_

_atom_site_label

_atom_site_type_symbol

_atom_site_occupancy

_atom_site_fract_x

_atom_site_fract_y

_atom_site_fract_z

Ge    Ge 0.4000      0.5000    0.5000    0.5000

Ge    Sb 0.4000      0.5000    0.5000    0.5000

Te    Te 1.0000      0.0000    0.0000    0.0000

The first column is a label, while the second represents the occupancy. Below are the results of reading in the CIF file of which a clip is shown above.

In [50]: gst = mg.Structure.from_file('Ge2Sb2Te5_Fm3m.cif')

In [51]: gst

Out[51]:

Structure Summary

Lattice

abc : 6.0309999999999997 6.0309999999999997 6.0309999999999997

angles : 90.0 90.0 90.0

volume : 219.36532779099997

  A : 6.0309999999999997 0.0 3.6929224228288435e-16

  B : 9.6985877001997945e-16 6.0309999999999997 3.6929224228288435e-16

  C : 0.0 0.0 6.0309999999999997

PeriodicSite: Ge:0.800 (3.0155, 3.0155, 3.0155) [0.5000, 0.5000, 0.5000]

PeriodicSite: Ge:0.800 (3.0155, 0.0000, 0.0000) [0.5000, 0.0000, 0.0000]

PeriodicSite: Ge:0.800 (0.0000, 0.0000, 3.0155) [0.0000, 0.0000, 0.5000]

PeriodicSite: Ge:0.800 (0.0000, 3.0155, 0.0000) [0.0000, 0.5000, 0.0000]

PeriodicSite: Te (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]

PeriodicSite: Te (0.0000, 3.0155, 3.0155) [0.0000, 0.5000, 0.5000]

PeriodicSite: Te (3.0155, 3.0155, 0.0000) [0.5000, 0.5000, 0.0000]

PeriodicSite: Te (3.0155, 0.0000, 3.0155) [0.5000, 0.0000, 0.5000]

Note that there is no evidence of Sb in the structure at all! It would appear that the Structure.from_file method simply parses the labels as if they were composition and this is clearly incorrect. I came across this when I received a (different) CIF file exported from Materials Studio (CASTEP among other codes) and found the user had changed the composition, but not the labels. Materials Studio had no trouble with this as it correctly used the composition for calculations based upon the structure, but the CIF file maintained the older labels and included the correct compositions, which strictly speaking is still correct.

Correct behavior:

The Structure.from_file method should use the occupancy field to determine the atom type of the site, not the label field. I have included the CIF file of which I showed a clip above for reference.

Suggested solution (if any)

The Structure.from_file() method should parse the composition fields and not the label fields to determine the contents of a given site in a structure.

Files (if any)

Ge2Sb2Te5_(Fm3m).cif

data_Ge2Sb2Te5_(Fm3m)
_audit_creation_method         'generated by CrystalMaker 9.2.9'
_publ_section_comment
;
Ge2.005Sb2.095Te5,RT,Vb=40.1Ge41.9Sb,Yamada&Matsunaga(2000)R=2.3%(XPD) Ge
2.005Sb2.095Te5,RT,Vb=40.1Ge41.9Sb,Yamada&Matsunaga(2000)R=2.3%(XPD) 
;
_cell_length_a                   6.0310(0)
_cell_length_b                   6.0310(0)
_cell_length_c                   6.0310(0)
_cell_angle_alpha               90.0000(0)
_cell_angle_beta                90.0000(0)
_cell_angle_gamma               90.0000(0)

_symmetry_space_group_name_H-M     'F m -3 m'
_symmetry_Int_Tables_number         225
_symmetry_cell_setting             cubic
loop_
_symmetry_equiv_pos_as_xyz
'+x,+y,+z'
'+x,1/2+y,1/2+z'
'1/2+x,1/2+y,+z'
'1/2+x,+y,1/2+z'
'+z,+x,+y'
'+z,1/2+x,1/2+y'
'1/2+z,1/2+x,+y'
'1/2+z,+x,1/2+y'
'+y,+z,+x'
'+y,1/2+z,1/2+x'
'1/2+y,1/2+z,+x'
'1/2+y,+z,1/2+x'
'+x,+y,-z'
'+x,1/2+y,1/2-z'
'1/2+x,1/2+y,-z'
'1/2+x,+y,1/2-z'
'+z,+x,-y'
'+z,1/2+x,1/2-y'
'1/2+z,1/2+x,-y'
'1/2+z,+x,1/2-y'
'+y,+z,-x'
'+y,1/2+z,1/2-x'
'1/2+y,1/2+z,-x'
'1/2+y,+z,1/2-x'
'-x,+y,+z'
'-x,1/2+y,1/2+z'
'1/2-x,1/2+y,+z'
'1/2-x,+y,1/2+z'
'-z,+x,+y'
'-z,1/2+x,1/2+y'
'1/2-z,1/2+x,+y'
'1/2-z,+x,1/2+y'
'-y,+z,+x'
'-y,1/2+z,1/2+x'
'1/2-y,1/2+z,+x'
'1/2-y,+z,1/2+x'
'-x,+y,-z'
'-x,1/2+y,1/2-z'
'1/2-x,1/2+y,-z'
'1/2-x,+y,1/2-z'
'-z,+x,-y'
'-z,1/2+x,1/2-y'
'1/2-z,1/2+x,-y'
'1/2-z,+x,1/2-y'
'-y,+z,-x'
'-y,1/2+z,1/2-x'
'1/2-y,1/2+z,-x'
'1/2-y,+z,1/2-x'
'+y,+x,+z'
'+y,1/2+x,1/2+z'
'1/2+y,1/2+x,+z'
'1/2+y,+x,1/2+z'
'+x,+z,+y'
'+x,1/2+z,1/2+y'
'1/2+x,1/2+z,+y'
'1/2+x,+z,1/2+y'
'+z,+y,+x'
'+z,1/2+y,1/2+x'
'1/2+z,1/2+y,+x'
'1/2+z,+y,1/2+x'
'+y,+x,-z'
'+y,1/2+x,1/2-z'
'1/2+y,1/2+x,-z'
'1/2+y,+x,1/2-z'
'+x,+z,-y'
'+x,1/2+z,1/2-y'
'1/2+x,1/2+z,-y'
'1/2+x,+z,1/2-y'
'+z,+y,-x'
'+z,1/2+y,1/2-x'
'1/2+z,1/2+y,-x'
'1/2+z,+y,1/2-x'
'+y,-x,+z'
'+y,1/2-x,1/2+z'
'1/2+y,1/2-x,+z'
'1/2+y,-x,1/2+z'
'+x,-z,+y'
'+x,1/2-z,1/2+y'
'1/2+x,1/2-z,+y'
'1/2+x,-z,1/2+y'
'+z,-y,+x'
'+z,1/2-y,1/2+x'
'1/2+z,1/2-y,+x'
'1/2+z,-y,1/2+x'
'+y,-x,-z'
'+y,1/2-x,1/2-z'
'1/2+y,1/2-x,-z'
'1/2+y,-x,1/2-z'
'+x,-z,-y'
'+x,1/2-z,1/2-y'
'1/2+x,1/2-z,-y'
'1/2+x,-z,1/2-y'
'+z,-y,-x'
'+z,1/2-y,1/2-x'
'1/2+z,1/2-y,-x'
'1/2+z,-y,1/2-x'
'-x,-y,-z'
'-x,1/2-y,1/2-z'
'1/2-x,1/2-y,-z'
'1/2-x,-y,1/2-z'
'-z,-x,-y'
'-z,1/2-x,1/2-y'
'1/2-z,1/2-x,-y'
'1/2-z,-x,1/2-y'
'-y,-z,-x'
'-y,1/2-z,1/2-x'
'1/2-y,1/2-z,-x'
'1/2-y,-z,1/2-x'
'-x,-y,+z'
'-x,1/2-y,1/2+z'
'1/2-x,1/2-y,+z'
'1/2-x,-y,1/2+z'
'-z,-x,+y'
'-z,1/2-x,1/2+y'
'1/2-z,1/2-x,+y'
'1/2-z,-x,1/2+y'
'-y,-z,+x'
'-y,1/2-z,1/2+x'
'1/2-y,1/2-z,+x'
'1/2-y,-z,1/2+x'
'+x,-y,-z'
'+x,1/2-y,1/2-z'
'1/2+x,1/2-y,-z'
'1/2+x,-y,1/2-z'
'+z,-x,-y'
'+z,1/2-x,1/2-y'
'1/2+z,1/2-x,-y'
'1/2+z,-x,1/2-y'
'+y,-z,-x'
'+y,1/2-z,1/2-x'
'1/2+y,1/2-z,-x'
'1/2+y,-z,1/2-x'
'+x,-y,+z'
'+x,1/2-y,1/2+z'
'1/2+x,1/2-y,+z'
'1/2+x,-y,1/2+z'
'+z,-x,+y'
'+z,1/2-x,1/2+y'
'1/2+z,1/2-x,+y'
'1/2+z,-x,1/2+y'
'+y,-z,+x'
'+y,1/2-z,1/2+x'
'1/2+y,1/2-z,+x'
'1/2+y,-z,1/2+x'
'-y,-x,-z'
'-y,1/2-x,1/2-z'
'1/2-y,1/2-x,-z'
'1/2-y,-x,1/2-z'
'-x,-z,-y'
'-x,1/2-z,1/2-y'
'1/2-x,1/2-z,-y'
'1/2-x,-z,1/2-y'
'-z,-y,-x'
'-z,1/2-y,1/2-x'
'1/2-z,1/2-y,-x'
'1/2-z,-y,1/2-x'
'-y,-x,+z'
'-y,1/2-x,1/2+z'
'1/2-y,1/2-x,+z'
'1/2-y,-x,1/2+z'
'-x,-z,+y'
'-x,1/2-z,1/2+y'
'1/2-x,1/2-z,+y'
'1/2-x,-z,1/2+y'
'-z,-y,+x'
'-z,1/2-y,1/2+x'
'1/2-z,1/2-y,+x'
'1/2-z,-y,1/2+x'
'-y,+x,-z'
'-y,1/2+x,1/2-z'
'1/2-y,1/2+x,-z'
'1/2-y,+x,1/2-z'
'-x,+z,-y'
'-x,1/2+z,1/2-y'
'1/2-x,1/2+z,-y'
'1/2-x,+z,1/2-y'
'-z,+y,-x'
'-z,1/2+y,1/2-x'
'1/2-z,1/2+y,-x'
'1/2-z,+y,1/2-x'
'-y,+x,+z'
'-y,1/2+x,1/2+z'
'1/2-y,1/2+x,+z'
'1/2-y,+x,1/2+z'
'-x,+z,+y'
'-x,1/2+z,1/2+y'
'1/2-x,1/2+z,+y'
'1/2-x,+z,1/2+y'
'-z,+y,+x'
'-z,1/2+y,1/2+x'
'1/2-z,1/2+y,+x'
'1/2-z,+y,1/2+x'

loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_occupancy
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
    Ge    Ge 0.4000      0.5000    0.5000    0.5000
    Ge    Sb 0.4000      0.5000    0.5000    0.5000
    Te    Te 1.0000      0.0000    0.0000    0.0000
shyuep commented 7 years ago

I have just pushed a fix. Thanks for reporting this.