materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.51k stars 863 forks source link

UnicodeEncodeError in IStructure.from_file #331

Closed hlyang1992 closed 8 years ago

hlyang1992 commented 8 years ago

when use IStructure.from_file read some cif file such as test will raise UnicodeEncodeError.

    si = mg.IStructure.from_file(filename)
  File "/home/vagrant/opt/anaconda2/lib/python2.7/site-packages/pymatgen/core/structure.py", line 1531, in from_file
    primitive=primitive, sort=sort)
  File "/home/vagrant/opt/anaconda2/lib/python2.7/site-packages/pymatgen/core/structure.py", line 1477, in from_str
    parser = CifParser.from_string(input_string)
  File "/home/vagrant/opt/anaconda2/lib/python2.7/site-packages/pymatgen/io/cif.py", line 292, in from_string
    stream = cStringIO(cif_string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

I find the error may be in monty.io.zopen, follow code can raise UnicodeEncodeError.

with zopen("test.cif", "r") as f:
    data = f.read()
stream = cStringIO(data)

However, follow code works fine, no error.

with open("test.cif", 'r') as f1:
    data = f1.read()
stream = cStringIO(data)
xhqu1981 commented 8 years ago

It looks like an error related to environmental variable settings. Try to set

export LC_ALL=en_US.UTF-8

export LANG=en_US.UTF-8

Let's know whether it works. Thanks

On Wednesday, April 13, 2016, lzuyanghl notifications@github.com wrote:

when use IStructure.from_file read some cif file such as test https://drive.google.com/open?id=0B7DapXLbWsRKcWdyVDk3REJPTEU will raise UnicodeEncodeError.

si = mg.IStructure.from_file(filename)

File "/home/vagrant/opt/anaconda2/lib/python2.7/site-packages/pymatgen/core/structure.py", line 1531, in from_file primitive=primitive, sort=sort) File "/home/vagrant/opt/anaconda2/lib/python2.7/site-packages/pymatgen/core/structure.py", line 1477, in from_str parser = CifParser.from_string(input_string) File "/home/vagrant/opt/anaconda2/lib/python2.7/site-packages/pymatgen/io/cif.py", line 292, in from_string stream = cStringIO(cif_string) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

I find the error may be in monty.io.zopen, follow code can raise UnicodeEncodeError.

with zopen("test.cif", "r") as f: data = f.read() stream = cStringIO(data)

However, follow code works fine, no error.

with open("test.cif", 'r') as f1: data = f1.read() stream = cStringIO(data)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/materialsproject/pymatgen/issues/331

shyuep commented 8 years ago

The file itself also has a binary symbol at the beginning. This is a bad file.