Mixed topics (non-unique channel names, class structure, ...)

sneusse commented 9 years ago

Hi ratal,

I recently found this module and played with it for some time - I like it so far! Please note that I'm not really a python-guy, I'm trying to use this from Julia via PyCall (I'm also not really a Julia-guy either, but whatever...)

Here is my system:

Win7 x64
Anaconda python 2.7
built your lib using VS 2013
Julia 0.3.11
Jupyter via Anaconda

Some things I noticed while testing:

Channel names are not unique (in many measurement files I got around here). Maybe a list would be a better data structure to store the channels. This issue also can raise exceptions when, for example, the resampling method is invoked. Having two signal-channels with the same name, one signal-channel will get overriden by another one when the file is read, leaving a random master-channel behind which might be the issue here, but I could not verify that yet.
Some encoding-exception is thrown when I try to resample the channels because of german umlaut characters (äöü) in the channel description. This only occurs when I use the lib via Julia. I could fix that by changing line (this might be not a good solution but it works for me): https://github.com/ratal/mdfreader/blob/b44231d003fb5557f96136770435c9af956c7843/mdfreader/mdfinfo3.py#L637

if PythonVersion < 3:
    if removeTrailing0:
       # original
       Block[fieldName] = value.replace('\x00', '')
       # changed
       Block[fieldName] = value.decode('latin1').encode('utf-8').rstrip('\x00')

Again also only relevant when using Julia: It is really difficult to call the methods of the mdf object as it is automatically converted to a dictionary. I could fix that by not inheriting from dict but to aggregate a dict inside the class, so every self[channelName] becomes self.channels[channelName] instead.

A lot of trial and error was involved as I consider myself a beginner in both languages, so maybe there are no issues and I'm just stupid :) I'm not really familiar with the git process, so if you need the code I'll send it to you. During the next few days I'll try to convert the lib (mdf3 only for now) to a list-based structure - maybe this will work.

Regards Sebastian

ratal commented 9 years ago

Hi sneusse, Thanks for widening my views with Julia and Jupyter, interesting. First are you using a tagged version, pypi or gitub (then what version) ? Some answers:

Not unique channel names : yes indeed this is a problem. This could be tackled as you proposed by having list instead of dict in mdf_skeleton class declaration (class mdf_skeleton(dict)). However, it will have major impact with the whole rest of the code for the moment. I am planning to further improve mdf_skeleton in order to create all the necessary basic methods to manipulate its content in order to have small impact on the other modules. This will take a bit of time but in the end you should be able to write your own data structure without impact to the other modules. Just a warning, if we save all the channels in one a list, to access it, we will have to find its index which a bit complicated. In case of not unique name, how to chose which index you want ? That is the reason why I selected dict structure, because those data are "channel name oriented". Anyway, fundamentally, having several times the same channel names in different datagroup (specification do not allow to have not unique names in same datagroup) is problematic if we flatten the datagroups into one, by resampling for instance, you need to anyway rename them in a dict. What I implemented so far was to simply append datagroup number at the end of channel name but only applied to master channels. I think I implemented the correct behaviour in mdf4reader recently with a major refactoring (dropping file multiprocessing, too complex code and I think not used, better to multiprocess by files for big amount of data or multiprocess channel conversion) around line 1270. I will do the same in mdf3reader (I guess you are mostly reading mdf3 files ?) to improve handling of not unique names.
thanks for the code change regarding character encoding, I will look at it and test it.
I do not get it. What do you mean by "difficult to call the methods of the mdf object" ? For info, there are some methods to get data from channel name (.getChannelData()) and an attribute to show data structure(.masterChannelList). Mostly dict is organised like below: mdf[channelName]['data'] ['unit'] ['description'] ['master'] master channel name ['conversion'] if convertAfterRead is false

Regards

Aymeric

sneusse commented 9 years ago

Good morning Aymeric,

Thanks for the fast response. I started hacking on the mdfreader-0.1.9.3 tarball.

It is true that the basic use-case would be a name-based channel access. But this could also be accomplished when using a list structure, we would need some iteration of course and maybe yield a set of channels instead of a single channel. Maybe it would be beneficial to store instances of the recordChannel class (or something similar).

Regarding point 3: This is most likely an issue with the Python -> PyCall -> Julia interface. PyCall notices that your class is some kind of dictionary (as it inherits from dict) and converts it to a 'native' Julia dictionary object. These dictionary objects won't have the instance methods of the python object anymore. If the class does not inherit from dict, PyCall does not know what to do and yields a wrapper of the python object, which can be used to invoke the instance methods.

Regards Sebastian

ratal commented 9 years ago

Hi sneusse, In latest git commit, I did fundamental refactoring that could help you to get rid of dictionnary structure. I actually created new mdf_skeleton class that defines how data is stored in mdf class. As mentionned, probably some methods are missing for the moment but I guess it could be good start for you. I have small knowledge about Julia... Tell me if you still struggle. Regards Aymeric

ratal commented 9 years ago

Hi sneusse, Thanks for the proposal regarding latin encoded, it will be implemented. In last github commit ::

I further implemented mdf_skeleton class. It contains mdf data structure and management. All the other modules are depending of him and they are modified to use its methods and not direct data structure -> you should theoretically be able to change only this class to adapt to Julia. However, as there so many basic data manipulation methods, replacing almost dict methods, probably direct use in Julia is possible ?
In mdf_skeleton, I solved unique channel name problem by appending data group number to name. Regards Aymeric

ratal commented 8 years ago

Feel fre to repoen this issue or update about your adaptation to Julia.

ratal / mdfreader

Mixed topics (non-unique channel names, class structure, ...) #16