Closed saijananiganesan closed 3 years ago
Short answer: add , encoding='latin1'
to your open
call.
Long answer: mmCIF files are supposed to be ASCII (7-bit). All ASCII files are also valid UTF-8 by construction. But this file contains non-ASCII (8-bit) characters (the error refers to a degree symbol in _flr_inst_setting.details
) which are not valid UTF-8. There is no easy way to determine programmatically and unambiguously what the encoding is supposed to be, but if you don't care about these symbols (I'm guessing you don't) latin1 (or ISO-8859-1) is also a superset of ASCII and will accept any 8-bit character (it might not match what the original author intended though). Alternatively, you can open the file in binary mode, which will be handled by python-ihm in the same way, as if latin1-encoded.
Thanks Ben!
I don't see anything unusual in the file, not sure why I am getting this error.
Exact line in code:
Error: XX/ihm/reader.py", line 3173, in read
more_data = r.read_file()
XX/ihm/format.py", line 566, in read_filereturn self._read_file_c()
XX/hm/format.py", line 616, in _read_file_ceof, more_data = _format.ihm_read_file(self._c_format)
XX/codecs.py", line 322, in decode(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 24322: invalid start byte