ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

Distinguish Channel Data with Source Path #152

Closed ninpeng8 closed 5 years ago

ninpeng8 commented 5 years ago

Python version

3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]

Platform information

Windows-7-6.1.7601-SP1

Numpy version

1.15.0

mdfreader version

2.7.8

Description

(I am working with mf4 files).

Is there a way I can use mdfreader to access the channel source path? For example, if I had channels with the same name that were logged on different busses, I would want to distinguish them via the source.

For example if I am pulling 3 different readings of the same signal coming from 3 busses like 'SourcePath1', 'SourcePath2', and 'SourcePath3', it would be nice to be able to do something like mdf.getChannelData('SignalName/SourcePath2'), if that makes sense.

Thanks!

ninpeng8 commented 5 years ago

Alternatively, can I get the group number, data group number, and channel number for a named channel? That way I can use the Info portion of mdfreader to find out the path.

EDIT: I was able to find the source path information using mdfinfo, but I would need to change the mdf class channel names itself or grab the channel data based on dg, cg, and cn vs. only channel name to distinguish appropriately.

danielhrisca commented 5 years ago

Is there a distinction between the display names for those channels?

ninpeng8 commented 5 years ago

No

danielhrisca commented 5 years ago

Would it be ok if you could give the source name as a second argument to the get method?

ninpeng8 commented 5 years ago

Yes, that would be fine!

ninpeng8 commented 5 years ago

Hi @danielhrisca I just wanted to check in! Were you able to come up with anything? Thanks!

danielhrisca commented 5 years ago

Hello @ninpeng8,

you feature request got my attention and it's a valid use case, but I'm afraid you'll have to ask Aymeric for an implementation in the mdfreader package.

Sorry if I mislead you to believe that I would implement this in mdfreader

ninpeng8 commented 5 years ago

Oh I see! Don't worry about it. Scanning through the other issues it looks like @ratal is working on something else at the moment, so hopefully he'll get to this afterward!

Thanks for starting the discussion!

danielhrisca commented 5 years ago

Your welcome! It's good to get user input; it's been really useful for me, so thank you as well

ratal commented 5 years ago

Hi, Ok, I will implement this, should not be too complicated.

ratal commented 5 years ago

I added a simple method getChannelName4 (in dev branch, you can try) that is looking up in mdf dict for the name and correspondig source path or name or channel group name, source path or name. It returns unique channel name in mdf that can then be reused by getChannelData4(). But main change was to addd these information in each channel dict: you can see a new entry 'id' in each channel containing the tuple: ((data group number, channel group number, channel number), (channel name, channel source, channel path), (group name, group source, group path)) So it gives you the freedom to use the new method or directly use the number for dataGroup, ChannelGroup channel with info class.

ninpeng8 commented 5 years ago

I am losing signals when I create the class (ex. I have 1300 in the mdf and only get 1100 back in the mdf class). I'm guessing the repeating signals are being taken out, and then the source information is indicated in that 'id:' block after the conversion. Is there a way to prevent this loss?

Thanks!

ninpeng8 commented 5 years ago

Looking deeper into the issue, I am not losing signals; I just cannot access other same-name channel that use different sources since the 'id' entry in the channel dict. If the key is the channel dict input, how do I look for all channels with that name to find the 'id' entries?

Furthermore, for some channels, the source is not recorded in the 'id' block (it says 'Message' instead).

ratal commented 5 years ago

Hi, There is a variable in read4&3 that is statically defined and allows to tune how much metadata can be read as this is taking big computation time for many channels. SIBlocks (containing source path and name) and others are considered secondary and therefore not read by default. However, if you parse the file with mdfinfo class standalone, you can get all metadata inclduing source info (SI blocks). You could try reading after changing line 1152 in md4freader.read4: minimal = 2 -> 0 But in the mean time I made this parameter as input argument in dev branch, so you could try: truc=mdfreader.mdf('file.mf4', metadata=0) truc.getChannelName4('channel name', 'source path')

ninpeng8 commented 5 years ago

EDIT: Please read the latest comment. I'll keep this comment here for reference however.

Thank you for your patience in waiting for my response. I looked into using the solution you presented, and it seems like I am actually losing signals, contrary to what I thought.

I also tried setting minimal to 0 and appending the source to the end of the channel name in the mdf4reader and mdf4info scripts, which allows me to access the signal data based on name and source. However, the number of signals is still the same (I am losing signals).

When I use the truc.getChannelName4 method, it will return the same data information regardless of the source, so I think the channels with repetitive names but different sources are still being lost.

Hopefully that makes sense! Thank you.

ninpeng8 commented 5 years ago

OK! Under further investigation, it looks like all of the channels that were missing had 'nan' values, so I'm assuming your code does not append that in the mdf class. (Is there a way we can keep those channels even though the data block is apparently empty, simply as a placeholder?)

Therefore, the only issue with the fix is that the truc.getChannelData4 method calls the same channel data, regardless of source (I even put a nonreal signal name as the second argument and it output the same array).

Sorry for the confusion! Thanks.

EDIT: under more investigation, it looks like to make sure that each channel has a unique name, a string related to the submaster for that channel is being appended to the end of the signal that isn't first. For example, if my signal names and sources are... 'SIGNAL\source1' with submaster 'time_1' 'SIGNAL\source2' with submaster 'time_2' 'SIGNAL\source3' with submaster 'time_3' ...the names present in the mdf class are 'SIGNAL', 'SIGNAL_2', and 'SIGNAL_3'. Hopefully this helps further the fix.

ratal commented 5 years ago

My explanation was maybe not clear. the object returned by truc=mdfreader.mdf('file.mf4', metadata=0) is a dict. You can have only one key for a value. If the key is the signal name but you have the same name in mdf file several times, you will overwrite several time the values for the same dict key/signal name. So mdfreader makes the signal name unique by appending datagroup number (for sorted data) to become a unique key. But when you want to retrieve a signal by its name, you have difficulties to find the correct key if number has been happened. So getChannelName4 is looking at all keys in mdf object that has the channel name and source path you are looking for and if found returns the corresponding key. If you want to have an idea of mdf content, you can type print(truc) or truc.masterChannelList to list all channels grouped by master channel. To have more details of a signal, you can type truc['time_1'] --> in 'id' key, you will be able to view names, sources and pathes. To get the data of signal with getChannelData, you have to use unique key of the signal/source/path you want and not signal name.

ninpeng8 commented 5 years ago

Ah that makes sense. Thank you for explaining.

I was getting an error with getChannelName4 so I assumed there was a typo. The error is at this line in getChannelName4: output.append(channel_name, (ndg, ncg, ncn))

which yields: "TypeError: append() takes exactly one argument (2 given)"

If I input a valid source path, the script is trying to append both channel_name and (ndg, ncg, ncn).

However, now that the source path name is in the truc[channel_name]['id'] output, I can work with that to get the information we need.

ratal commented 5 years ago

Maybe a bug. Strange, I did test and did not get this error. Can you try output.append((channel_name, (ndg, ncg, ncn))) , so with an additionnal () to append a tuple to the ouput list ?

ratal commented 5 years ago

Seems satisfying solution since then.