ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

Mixed topics (non-unique channel names, class structure, ...) #16

Closed sneusse closed 8 years ago

sneusse commented 9 years ago

Hi ratal,

I recently found this module and played with it for some time - I like it so far! Please note that I'm not really a python-guy, I'm trying to use this from Julia via PyCall (I'm also not really a Julia-guy either, but whatever...)

Here is my system:

Some things I noticed while testing:

if PythonVersion < 3:
    if removeTrailing0:
       # original
       Block[fieldName] = value.replace('\x00', '')
       # changed
       Block[fieldName] = value.decode('latin1').encode('utf-8').rstrip('\x00')

A lot of trial and error was involved as I consider myself a beginner in both languages, so maybe there are no issues and I'm just stupid :) I'm not really familiar with the git process, so if you need the code I'll send it to you. During the next few days I'll try to convert the lib (mdf3 only for now) to a list-based structure - maybe this will work.

Regards Sebastian

ratal commented 9 years ago

Hi sneusse, Thanks for widening my views with Julia and Jupyter, interesting. First are you using a tagged version, pypi or gitub (then what version) ? Some answers:

  1. Not unique channel names : yes indeed this is a problem. This could be tackled as you proposed by having list instead of dict in mdf_skeleton class declaration (class mdf_skeleton(dict)). However, it will have major impact with the whole rest of the code for the moment. I am planning to further improve mdf_skeleton in order to create all the necessary basic methods to manipulate its content in order to have small impact on the other modules. This will take a bit of time but in the end you should be able to write your own data structure without impact to the other modules. Just a warning, if we save all the channels in one a list, to access it, we will have to find its index which a bit complicated. In case of not unique name, how to chose which index you want ? That is the reason why I selected dict structure, because those data are "channel name oriented". Anyway, fundamentally, having several times the same channel names in different datagroup (specification do not allow to have not unique names in same datagroup) is problematic if we flatten the datagroups into one, by resampling for instance, you need to anyway rename them in a dict. What I implemented so far was to simply append datagroup number at the end of channel name but only applied to master channels. I think I implemented the correct behaviour in mdf4reader recently with a major refactoring (dropping file multiprocessing, too complex code and I think not used, better to multiprocess by files for big amount of data or multiprocess channel conversion) around line 1270. I will do the same in mdf3reader (I guess you are mostly reading mdf3 files ?) to improve handling of not unique names.
  2. thanks for the code change regarding character encoding, I will look at it and test it.
  3. I do not get it. What do you mean by "difficult to call the methods of the mdf object" ? For info, there are some methods to get data from channel name (.getChannelData()) and an attribute to show data structure(.masterChannelList). Mostly dict is organised like below: mdf[channelName]['data'] ['unit'] ['description'] ['master'] master channel name ['conversion'] if convertAfterRead is false

Regards

Aymeric

sneusse commented 9 years ago

Good morning Aymeric,

Thanks for the fast response. I started hacking on the mdfreader-0.1.9.3 tarball.

It is true that the basic use-case would be a name-based channel access. But this could also be accomplished when using a list structure, we would need some iteration of course and maybe yield a set of channels instead of a single channel. Maybe it would be beneficial to store instances of the recordChannel class (or something similar).

Regarding point 3: This is most likely an issue with the Python -> PyCall -> Julia interface. PyCall notices that your class is some kind of dictionary (as it inherits from dict) and converts it to a 'native' Julia dictionary object. These dictionary objects won't have the instance methods of the python object anymore. If the class does not inherit from dict, PyCall does not know what to do and yields a wrapper of the python object, which can be used to invoke the instance methods.

Regards Sebastian

ratal commented 9 years ago

Hi sneusse, In latest git commit, I did fundamental refactoring that could help you to get rid of dictionnary structure. I actually created new mdf_skeleton class that defines how data is stored in mdf class. As mentionned, probably some methods are missing for the moment but I guess it could be good start for you. I have small knowledge about Julia... Tell me if you still struggle. Regards Aymeric

ratal commented 9 years ago

Hi sneusse, Thanks for the proposal regarding latin encoded, it will be implemented. In last github commit ::

ratal commented 8 years ago

Feel fre to repoen this issue or update about your adaptation to Julia.