mdf4reader.py fails to get max len on unicode channels

ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python

Other

169 stars 74 forks source link

mdf4reader.py fails to get max len on unicode channels #139

Closed laurentvm closed 6 years ago

laurentvm commented 6 years ago

Line 1873 of mdf4reader.py maxlen = max([len(str(ref)) for ref in cc_ref]) fails if ref is unicode. Had to change to this to export to Matlab.

maxlen = max([len(repr(ref).encode('utf-8')) for ref in cc_ref])

ratal commented 6 years ago

Hi Laurent, Can you inform what is python version you use ? unicode is big change between 2.7 and 3.x But I guess you use 2.7 according to my tests, len(str(u'ö$oi'))will fail only for python 2.7 However, len(u'ö$oi')will work properly for both python version --> can you try without the str() ? Actually, I do nto think len(repr(ref).encode('utf-8'))gives the right length.

laurentvm commented 6 years ago

Hi,

Yes it was with python 2.7. I can try with 3.x if you need too. I will try with len only.

By the way, the file I got (mf4) is full of unicode. I'm not sure but it seems that it fails to export to matlab/xlsx with this unicode values.

For instance, I have a numpy array with \u'0' values or \u'Init' which was a pain to output from the numpy array. I will continue my investigation. Unfortunately, I cannot share the file with you but I way look if I can replicate the structure to share it.

laurentvm commented 6 years ago

... Just seen you're in Belgium. I'm too.

ratal commented 6 years ago

mf4 is by specification only unicode and using xml for metadata, this was major change from 3.x to 4.x If there is an error with the matlab export, maybe we should then rather focus on this method to improve its robustness ?

laurentvm commented 6 years ago

Just an update. Using python 2.7. XLSX fails as mentionned.

Excel file output: 2018-03-23 11_06_18-2018-03-17_13 05 01_ 20121026-005 _all_data-f0018 xlsx - excel

Python output with: for s in signal: val=yop[s]['data'] print s,val 2018-03-23 10_52_28-giswlx109_99 giswlx109_1 aa30891 - vnc viewer

There is value in the signal but in excel/matlab, there is nothing.

laurentvm commented 6 years ago

Same with 3.5

ratal commented 6 years ago

Try with latest dev branch, it should be fixed

laurentvm commented 6 years ago

Hi, just tested with python 3, the excel gives much more information now. Great!

I have many signals named t, before your code were completing the header name by t, t_1, t_2, ... Now it append the suffix twice. 2018-03-26 11_51_18-2018-03-17_13 05 01_ 20121026-005 _all_data-f0018 xlsx - excel

I still have to check to matlab output

ratal commented 6 years ago

This could be normal behaviour. If you have unsorted data, like several channel groups per datagroup, 't' channel could be also present several time, so datagroup number and channelgroup numbers are appended to duplicated channel names

ratal commented 6 years ago

No more feedback since while. If still an issue, you can reopen.

laurentvm commented 6 years ago

Hi, it’s working ok. Thanks for the fix