Closed jkauramaki closed 10 years ago
Thanks for reporting @jkauramaki I think it should be straight forward to add a test that adds UTF-8 names on runtime and saves a temp file to reproduce and ultimately tackle this issue.
I can tackle this next week sometime if nobody else wants to.
I wrote a test locally which basically reproduces reading and writing errors related to unicode.
https://github.com/dengemann/mne-python/commit/7a48e7422518edbd61b11eaf116c92c38d4dbb56
produces:
Overwriting existing file.
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-3-662bcbb403de> in <module>()
----> 1 test_raw.test_io_raw()
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/fiff/tests/test_raw.py in test_io_raw()
360 raw.info['description'] = text_type('äöé')
361 temp_file = op.join(tempdir, 'raw.fif')
--> 362 raw.save(temp_file, overwrite=True)
363 raw = Raw(tmp_file)
364
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/fiff/raw.pyc in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, format, overwrite, verbose)
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/utils.pyc in verbose(function, *args, **kwargs)
385 return ret
386 else:
--> 387 ret = function(*args, **kwargs)
388 return ret
389
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/fiff/raw.pyc in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, format, overwrite, verbose)
986
987 outfid, cals = start_writing_raw(fname, info, picks, type_dict[format],
--> 988 reset_range=reset_dict[format])
989 #
990 # Set up the reading parameters
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/fiff/raw.pyc in start_writing_raw(name, info, sel, data_type, reset_range)
1908 cals.append(info['chs'][k]['cal'] * info['chs'][k]['range'])
1909
-> 1910 write_meas_info(fid, info, data_type=data_type, reset_range=reset_range)
1911
1912 #
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/fiff/meas_info.pyc in write_meas_info(fid, info, data_type, reset_range)
503 write_string(fid, FIFF.FIFF_EXPERIMENTER, info['experimenter'])
504 if info.get('description') is not None:
--> 505 write_string(fid, FIFF.FIFF_DESCRIPTION, info['description'])
506 if info.get('proj_id') is not None:
507 write_int(fid, FIFF.FIFF_PROJ_ID, info['proj_id'])
/Users/denisaengemann/anaconda/lib/python2.7/site-packages/mne/fiff/write.pyc in write_string(fid, kind, data)
74 """Writes a string tag"""
75 data_size = 1
---> 76 _write(fid, str(data), kind, data_size, FIFF.FIFFT_STRING, '>c')
77
78
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
I know this writing not reading but the culprit should be the same.
@Eric89GXL we should think about a central fix which tackles this issue for all io related functions.
I guess we will see things like this wherever we write to fiff files...
Yeah. We had to tweak the writing a bit for Python3 support, which would explain why this issue didn't exist before, but does now. Python3 required explicitly encoding/decoding to do bytes<->string conversions, I assume the problem is with how we did that.
It sees that there is a small problem in decoding experimenter name tag from raw fiff files at least with the latest mne-python, as I remember succeeding some months ago with files from same dataset (with latest mne-python back then). It seems that my full name with the scandinavian letter "ä" has been used in user account creation for the MEG acquisition computer (personally I would have simply used "a" but I guess the admin had a Finnish keyboard). This letter, however, seems to be stored in non-utf-8 format (ISO-8859-1/Latin-1 is my best guess, or simply broken encoding). The end results is now that raw fiff file loading fails.
Small code change in fiff/tag.py (line 347) to simply
is not enough, as it results an error "UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 13: invalid continuation byte". Similar error comes up with forcing 'ISO-8859-1' encoding, so I guess the encoding is simply broken. However, for initial workaround solution, changing the line to e.g.
seems to work fine.
And yes I know could attempt the change the relevant code for good (if that minor change shows no side effects), but unfortunately I'm still learning the basics of python (i.e. only running slightly modified MNE-python code examples) AND github (i.e., created an account just for this :) and I have no idea what kind of encoding the raw fiff file should use in string tags..
Full traceback in case of problematic file