jjhelmus / nmrglue

A module for working with NMR data in Python
BSD 3-Clause "New" or "Revised" License
209 stars 86 forks source link

UnicodeDecodeError when opening up UCSF files #116

Closed ostannick closed 4 years ago

ostannick commented 4 years ago

Trying to open up a 3D spectrum (UCSF)

>>> import nmrglue as ng
>>> dic,data = ng.pipe.read("data/HNCO.ucsf")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\kaneo\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nmrglue\fileio\pipe.py", line 525, in read
    dic = fdata2dic(fdata)
  File "C:\Users\kaneo\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nmrglue\fileio\pipe.py", line 1540, in fdata2dic
    dic["FDSRCNAME"] = _unpack_str('16s', fdata[286:290])
  File "C:\Users\kaneo\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nmrglue\fileio\pipe.py", line 1533, in _unpack_str
    return struct.unpack(fmt, d)[0].decode().strip('\x00')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte

Any ideas how to fix this?

JLVarjo commented 4 years ago

There is an extended ASCII character in your file somewhere (apparently È). As a quick fix try to replace that. In code this can be fixed by augmenting all non-binary file opens and relevant decode calls with encoding="utf-8", errors="replace" Funny coincidence, I just encountered the same issue and fixed these for all fileio code, so there will be pull request soon :)

Edit: now I noticed that in Bruker procs file reading this issue has been fixed in a bit different way (#101), using encoding=locale.getpreferredencoding() @jjhelmus, would this be the suggested way to fix this instead?

kaustubhmote commented 4 years ago

@ostannick, seems like you are trying to read a sparky file in nmrglue, but using the pipe.read function, which will work only for nmrpipe files. You should instead use:

>>> dic, data = ng.sparky.read("data/HNCO.ucsf")
ostannick commented 4 years ago

Thank you both, apologies for my idiocy!