ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 73 forks source link

incorrect endianness when using dataread and channel_list #187

Closed ludwig-nc closed 4 years ago

ludwig-nc commented 4 years ago

Python version

'3.7.4 (default, Jul 9 2019, 18:13:23) \n[Clang 10.0.1 (clang-1001.0.46.4)]'

Platform information

'Darwin-19.5.0-x86_64-i386-64bit' same problem exits on linux machine

Numpy version

'1.18.1'

mdfreader version

'4.1' dev branch

Description

unfortunately I am using proprietary data files and I cannot send you an example MD4-file, but hopefully I narrowed down my problem enough for you to have to have a look. The problem only exists when I am using the dataRead module and the channel_list argument. When reading in the whole MD4-file or when I am removing the dataRead.so everything is fine.

The problem is as follows: When I am reading in a single channel of the file the data of the channel itself is fine but the corresponding master channel is read in with the wrong endianness. I am reading in the channel with the convert_after_read=False flag:

dat['time_3_3']['data'][:10]
array([                   0, -9180983664580755456,    12720254316707840,
       -9168263410264047616,    25440508633415680, -9155543155947339776,
          38160762950123520, -9215160866838675456,    50881017266831360,
       -9202440612521967616])

when flipping the endianness of this I get the correct 100 Hz sampled time channel after applying the conversion factor of 1e-9:

np.frombuffer(dat['time_3_3']['data'].tostring(),dtype='>i8')[:10] * 1e-9
array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09])
ratal commented 4 years ago

dataRead should handle the byte order. But first, are your data sorted or unsorted ? I guess sorted because of 'time_3_3' but please confirm, 2 different parts of code. Assuming sorted and checking with MDFValidator for instance, is 'time_3_3' data type 4 (LE) or 5 (BE) ? Should be 5 as you have LE machine. Normally, read_double in dataRead module should swap the bytes simply using numpy with a .byteswap(). Maybe a flag wrongly calculated. What would help would be to dump the sorted_data_read() arguments in mdf4reader.py, read_channels_from_bytes() when processing 'time_3_3', I could potentially narrow down in dataRead() the bug if the bug does not yet come from mdf4reader.

ludwig-nc commented 4 years ago

Hi Aymeric,

Thanks a lot for looking into this issue. The File is a sorted MDF4 File. The data type of the time channel according to MDF Validator is 2=SIGNED_INTEL (for comparison the corresponding data channel is of type 0=UNSIGN_INTEL)

read_channels_from_bytes is called repeatedly with the arguments:

channel_set = {'time_3_3', 'EngRPM'} 
n_records=1000 
dtype=None 
channels_indexes=None

then sorted_data_read gets called repeatedly twice (I assume the first time is for 'time_3_3' and the second time for 'EngRPM') with the following arguments:

print(self[chan].bit_count(info),
 self[chan].signal_data_type(info),
self[chan].native_data_format(info),
n_records, self.CGrecordLength,
self[chan].bit_offset(info),
self[chan].pos_byte_beg(info),
self[chan].calc_bytes(info, aligned=False),
array_flag)

>>> 64 2 i8 1000 107 0 0 8 0
print(self[chan].bit_count(info),
self[chan].signal_data_type(info),
self[chan].native_data_format(info),
n_records, self.CGrecordLength,
self[chan].bit_offset(info), 
self[chan].pos_byte_beg(info),
self[chan].calc_bytes(info, aligned=False), 
array_flag)

>>> 14 0 u2 1000 107 0 23 2 0
ratal commented 4 years ago

It is weird to have signed int for a time signal.. But anyway, I think I found the obvious bug in dataRead, line 125 and 126, should not be 0,1 but 2,3 You can check in dev branch for the fix -> you will need to recompile dataRead

ludwig-nc commented 4 years ago

Many thanks, all is fine now. Are you planning on applying the same fix to the master branch?

ratal commented 4 years ago

Yes, I will issue new tag soon after some quality check