ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

No support for lookup table conversion #132

Closed cristi-neagu closed 6 years ago

cristi-neagu commented 6 years ago

Python version

3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]

Platform information

Windows 7

Numpy version

1.14.0

mdfreader version

2.7.4

Description

No support for conversion tables

Due to size constraints, rather than converting the entire data vector with the lookup table, a better solution would be to provide a function that returns the conversion table as a dictionary. I don't know how values are retrieved from the MDF structure, but i can probably help with everything else.

Thank you.

ratal commented 6 years ago

I am not so sure to understand what you propose with conversion table as dictionnary. Do you want to do something like while using convertAfterRead=False argument ? (it does not convert the channels but only when using .getChannelData() method, saving memory) the conversion tables info are within the info class in 'CC' block when existing. When using convertAfterRead=False, this conversion information (a dict) is kept for each channel within key 'conversion'.

cristi-neagu commented 6 years ago

Well, i have some channels in my files that have lookup tables associated with them. When looking at them in MDA i can see a string value instead of the numerical value. If i read these channels with mdfreader, i only get the numerical data, without setting convertAfterRead=False. So i assumed that this feature doesn't work.

How do i get the conversion table for a particular channel? I can't find this 'conversion' key.

ratal commented 6 years ago

By default, reader will use convertAfterRead=True. Did you try reading with argument =False ? If False, the output mdf should contain a 'conversion' key for each channel along with other keys like 'data', 'master', 'description'

cristi-neagu commented 6 years ago

Using convertAfterRead=False does indeed produce the conversion dictionary. So that means that the conversion doesn't work properly for lookup tables (conversion type is 11, btw). To be honest, as it is right now, this is preferable. It's much easier to store an array of integers than an array of strings. However, it's not ideal.

Can the conversion dictionary be made available at all times, no matter the convertAfterRead parameter? Or can a function be provided to return the conversion dictionary for a given channel?

ratal commented 6 years ago

So once you have read the file with convertAfterRead=False, you can get the channel data using .getChannelData(channel_name, raw_data=False/True) if raw_data is True, you should get number, if False you'll have converted string output. If you are really interested in the conversion dictionnary, it is in the key 'conversion' value of the channel: mdf['channel_name']['conversion']

cristi-neagu commented 6 years ago

.getChannelData doesn't have a raw_data argument.

ratal commented 6 years ago

Ah yes, sorry, this is implemented in next release, still in development. You can try it from git.

cristi-neagu commented 6 years ago

Ok, i think i know what's going on.

My channel isn't getting converted because the conversion type is 11, and that isn't featured in mdf3._convert3() (note for the future: use a dict for case switching). As such, it just returns the values unaltered. That is actually good in this case, so i really don't mind it.

The line i do mind is 1052. If that line is commented out, i get exactly the behaviour i'm after, where the conversion is available at all times. I don't think this is a big penalty. Could we add a parameter to mdfreader.mdf(), something like keepConversion=False that, when true, would bypass line 1052?

From:

self.setChannelData(channelName, self._convert3(channelName, self.convert_tables))
self.remove_channel_conversion(channelName)

To:

self.setChannelData(channelName, self._convert3(channelName, self.convert_tables))
if not self.keepConversion:
        self.remove_channel_conversion(channelName)

I can make a pull request, if you want.

ratal commented 6 years ago

I am afraid it will add too much complexity, too much implications with wersion 4.x, etc. I would rather prefer actually implement for type 11 and use as explained general previously expected behaviour. What do you think ?

cristi-neagu commented 6 years ago

Implementing that would have an even bigger impact, i think. We're talking about vectors containing hundreds of thousands of values going from int to string. Depending on the size of the string, that's an increase of 10-20 times in size. Not ideal.

I guess the only option is to use convertAfterRead=False.

ratal commented 6 years ago

There could also be an alternative which is .convert_tables attribute that could be put False by default in mdf.py, init() method. It will avoid only table conversion types (11 and 12 for mdf 3.x). Like this, everybody can custom behaviour depending of the needs, like the following:

yop=mdfreader.mdf()
yop.convert_tables = False/True
yop.read('file_name')

I am then implementing this conversion type 11 in the dev branch --> you could give a try with your file ?

cristi-neagu commented 6 years ago

Why not make that variable a part of the arguments for init()?

def __init__(self, fileName=None, channelList=None, convertAfterRead=True,
                 filterChannelNames=False, noDataLoading=False,
                 convert_tables=True, compression=False):
    ...
    self.convert_tables = convert_tables

Can then call it in one line:

yop = mdfreader.mdf('file_name.mdf', convert_tables=False)

Same result, but more consistent.

cristi-neagu commented 6 years ago

Actually, you might want to reconsider line 142 from mdf.py. I think it should be self.convert_tables = False and the argument should be false by default. As you say in mdf4reader.py, it's computationally intensive and false by default.

ratal commented 6 years ago

Yes, that's what also has changed on dev branch, I put False, originally was True For the init, I agree, would be more consistent, even thoug the number of arguments starts to be bit long for my taste.

cristi-neagu commented 6 years ago

This seems to be working as expected. Tried convertTables both with and without convertAfterRead. Got expected behaviour in both cases.