materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.52k stars 868 forks source link

Failed to read cifs files download from ICSD-Desktop #510

Closed zbwang closed 8 years ago

zbwang commented 8 years ago

System

Summary

Example code

from pymatgen import Structure

s = Structure.from_file('MyBaseFileNameCollCode59959.cif')
print(s)

Error message

/Users/wzb/.pyenv/versions/3.5.2/Python.framework/Versions/3.5/lib/python3.5/site-packages/pymatgen-4.5.0-py3.5-macosx-10.11-x86_64.egg/pymatgen/io/cif.py:703: UserWarning: could not convert string to float: '.'
  warnings.warn(str(exc))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-74-b53b5635bdc2> in <module>()
      1 from pymatgen import Structure
      2 
----> 3 s = Structure.from_file('MyBaseFileNameCollCode59959.cif')

/Users/wzb/.pyenv/versions/3.5.2/Python.framework/Versions/3.5/lib/python3.5/site-packages/pymatgen-4.5.0-py3.5-macosx-10.11-x86_64.egg/pymatgen/core/structure.py in from_file(cls, filename, primitive, sort, merge_tol)
   1450             return cls.from_str(contents, fmt="cif",
   1451                                 primitive=primitive, sort=sort,
-> 1452                                 merge_tol=merge_tol)
   1453         elif fnmatch(fname, "*POSCAR*") or fnmatch(fname, "*CONTCAR*"):
   1454             s = cls.from_str(contents, fmt="poscar",

/Users/wzb/.pyenv/versions/3.5.2/Python.framework/Versions/3.5/lib/python3.5/site-packages/pymatgen-4.5.0-py3.5-macosx-10.11-x86_64.egg/pymatgen/core/structure.py in from_str(cls, input_string, fmt, primitive, sort, merge_tol)
   1390         if fmt == "cif":
   1391             parser = CifParser.from_string(input_string)
-> 1392             s = parser.get_structures(primitive=primitive)[0]
   1393         elif fmt == "poscar":
   1394             s = Poscar.from_string(input_string, False).structure

/Users/wzb/.pyenv/versions/3.5.2/Python.framework/Versions/3.5/lib/python3.5/site-packages/pymatgen-4.5.0-py3.5-macosx-10.11-x86_64.egg/pymatgen/io/cif.py in get_structures(self, primitive)
    703                 warnings.warn(str(exc))
    704         if len(structures) == 0:
--> 705             raise ValueError("Invalid cif file with no structures!")
    706         return structures
    707 

ValueError: Invalid cif file with no structures!

Suggested solution (if any)

Files (if any)

cifs.zip

two example cif files from ICSD-Desktop and one from Abinit calc.(prtcif = 1)
shyuep commented 8 years ago

I fixed for all the ICSD files. I cannot fix the abinit version because the abinit version has serious problems. First, the symmetry positions is not properly preceded with a _. Second, the atomic positions reported are already replicated for symmetrically equivalent sites but the symm ops is not P1. This results in each site having a site occupancy of 4, which is ridiculous. The ABINIT people @gmatteo should fix their CIF format.

nbobbitt commented 7 years ago

I am also encountering this issue with cifs from the CSD. Can you please elaborate on how to fix it?

shyuep commented 7 years ago

Pls post an example CIF that cannot be parsed.

nbobbitt commented 7 years ago

Here is an example file. I'm trying to parse it using pymatgen and I get the following error:

The line that calls the parser: 74 # Read the coordinates from the cif file using pymatgen 75 f1=CifParser(cif_file_name) 76 struct=f1.get_structures()[0]

The error:

Traceback (most recent call last): File "../../SRC/myscript.py", line 76, in struct=f1.get_structures()[0] File "/home/nsb4513/.local/lib/python3.4/site-packages/pymatgen/io/cif.py", line 708, in get_structures raise ValueError("Invalid cif file with no structures!") ValueError: Invalid cif file with no structures!

ABACUF.MOF_subset.zip

shyuep commented 7 years ago

There is a warning that tells you that the CIF file has occupancies greater than 1. In your file, tehre are overlapping atoms. For example, Ba1 and Ba1 are the same fractional coordinates. I am not sure of the format of the CSD files. Are you supposed to ignore the labels with a ?

shyuep commented 7 years ago

In any case, you can also get around the problem by setting a high occupancy tolerance. E.g.,

from pymatgen.io.cif import CifParser
s = CifParser("ABACUF.MOF_subset.cif", occupancy_tolerance=100).get_structures()[0]
print(s)

The occupancy tolerance allows the occupancy to take a value from 1 to the tolerance value, but will rescale every occupancy to 1.

nbobbitt commented 7 years ago

I am unsure about the asterisk issue, but I will look into. Meanwhile the tolerance work-around appears to be working. Thanks for your help.