ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

Trouble converting INCA .dat file to Panda data frame #131

Closed ajaysgowda closed 3 years ago

ajaysgowda commented 6 years ago

Pyhton version

Python 3.6

os

Windows

Numpy version

1.13.1

mdfreader version

2.7.4

Description

Hello, I'm trying to convert my inca .dat file to a pandas data-frame. The data = mdfreader.mdf(fileLocation) line works fine. But when I use the dataInPandas = data.convertToPandas line, I get a method datatype that i cannot open. If I use the dataInPandas = data.convertToPandas() line, I get a None type data command. I want to able to see the data as a data-frame variable type. Am i doing something wrong? Please help Thanks Ajay

ratal commented 6 years ago

Hi, dataInPandas = data.convertToPandas() is not correct. This convertToPandas method converts the object data into pandas frames and does not return a pandas object

ajaysgowda commented 6 years ago

Thanks for the quick response. Im new to this so please bear with me. I have a .dat file from ETAS INCA and i would like to convert it to a usable format in python for data processing. how do you suggest i go about it? thanks again

ratal commented 6 years ago

Hi, pandas is good option for processing data in python. Depends of what kind of processing you want to do but most used and common file format is hdf5. ldf class as is gives you already access to your data directly, you can follow the readme file for examples

ajaysgowda commented 6 years ago

thanks. i am trying to convert the mdfreader.mdf type to a dataframe type so that i can access the data. how do i do that?

cristi-neagu commented 6 years ago

You can already access the data using mdf.getChannelData(channelName). That returns a numpy vector.

ajaysgowda commented 6 years ago

my file is large and im having issues figuring out the rater at which it is recorded. i guess i need to do more digging. thanks.

cristi-neagu commented 6 years ago

To figure out the exact raster, you need to do a bit of juggling. You need to use mdf.getChannelData(mdf.getChannelMaster(channelName)) to get the master time vector for a given channel, then do a simple division of the second and first values, round that off to the nearest hundredth, and that should give you the measure rate. But i don't think that's what you want to do.

If you must sort the channels by measure rate, you can simply use the name of the master channel as rasters. Alternatively, you could resample the file to a known measure rate, and everything will be aligned. I think this latter option is probably what you need.

If your measure files are large, you can load only certain channels (assuming you don't need all of them). I think the option is called channelList, and you can use it when loading the file. You can also use noDataLoading=True so only the channel names are loaded.

ajaysgowda commented 6 years ago

this works perfectly thanks.

How do i name the master channels as the rater? is there a parameter i can feed when i call mdfreader.mdf().

I need one last issue resolved for me to be able to move completely from matlab to Python. The data file i have has multiple channels with the same name coming from different sources. Usually the channel name has \'sourceName' appended to the end of it. It did in matlab atleast. is there a way i can make sure that part of the file name is reflected when read MDF data from this library.

I appreciate your help. Thanks

cristi-neagu commented 6 years ago

I think you can rename the master channels. I'm not sure, because i never needed to do it. It might break things, though, unless you make sure to update all slave channels with the new name.

When detecting multiple channels with the same name, the module appends _1, _2 to the name, i think. To find out the raster, you can get the master channel name. But what do you mean by sourceName? Are you using multiple devices to record data?

ajaysgowda commented 6 years ago

i think i can work around the whole raster stuff. Yes. I'm recording from multiple devices at the same time.

cristi-neagu commented 6 years ago

I'm not sure how Inca deals with that. If Inca appends the source name to the channel name, it should still be there. If not, mdf will assign different terminations to duplicate channels. I think the best thing to do is just to try it and see what happens.

import mdfreader as mdf
datFile = mdf.mdf('file.dat')
for channel in datFile:
    print(channel)
ajaysgowda commented 6 years ago

This is what i get. looks like it recognises the different sources. But just adds a number to the end of the channel name.

time
'channelName'
time_1
'channelName'_1
time_2
'channelName'_2
cristi-neagu commented 6 years ago

I think this might be something for Aymeric to look into. See if multi-device support is possible.

ajaysgowda commented 6 years ago

Thanks. From what i can see from matlab, the source device in the Long name in matlab but not the description. Not sure if that helps , bit thought it would be informative to mention.

ratal commented 6 years ago

Hi, To rename channel, there should be a method called rename_channel() available. Indeed, when there is duplicate channel name in the same file, it appends the data group number to it to make it unique. If ithis would not exist and because the file content (all channels) is flattened (mdf file content is more structured like a tree), you would write different data in the same 'container' or dict key = channel name, overwriting them, loosing data, which one do not want. There has been long thinking on what is best mdf object definition for python and this flattened way seemed the most simple, easiest to access/analyse data and close to bare python but it brings then some complications like this one. This number appending happens in mdfinfo3&4, info class. For instance mdfinfo4, in readCNBlock, around line 1578 or 1581, so you could change the behaviour relatively easily and append source instead. around line 225 and 228 for mdfino3. However, I would not advise it because:

ratal commented 6 years ago

Correction, rename_channel as currently implemented will indeed will break things if you want to change master names (master key in channel dict will not be corresponding anymore, breaking its link). I will improve it. By the way, how is it an issue to have number instead of source name ? I guess it is because you expect its name to understand to what is corresponding to duplicated channelName ?

danielhrisca commented 6 years ago

It can get confusing having channel_1 and master_2

ratal commented 6 years ago

Yes, I can understand, but I think I do not have better idea/solution for the moment...

danielhrisca commented 6 years ago

You would have to drop the dict like behavior

ratal commented 6 years ago

uhmm, implications will be very big for the code at this point. Plus I am not sure it will be more easy to use or understand file content. I know you opted for object instead and it brings also other advantages but I am still personnally in favour of simple basic python objects. Maybe you can give your opinion on that, comparatively ?

ajaysgowda commented 6 years ago

the some channel names from various sources are the same.

When i have the same channel names which are numbered eg. channel_1, channel_2..... this results in me not knowing which channel name corresponds to which source. From MDFimport in matlab i can see that the longSignalName has the source info appended to the channel name but the signalDescription doesnt have that info. not sure if that helps.

cristi-neagu commented 6 years ago

Maybe there doesn't need to be a big change. I imagine source data might be somewhere available in the MDF file. When duplicate channels are found, can we try and look for a source name and use that as an append instead. If no source can be found, we fall back to the current behaviour.

danielhrisca commented 6 years ago

I just wanted to say that it's impossible to manage name conflicts and still have dict behavior. I'm sure a code change in this direction would mean a lot of rework.

Having objects (I think you mean dedicated class for each block type) makes the internal structure similar to the one of the file on disk. It's sometimes easier to debug errors.

ratal commented 6 years ago

Ok. Good idea Cristi, seems good compromise, I will try that. I guess priority for mdf 3.x ?

ratal commented 6 years ago

I just introduced it, might be a bit buggy but you can try it on both mdf 3.x and 4.x

ajaysgowda commented 6 years ago

unfortunately it didn't seem to work. i still get the same results as before. :(. Thanks for trying to accommodate my request

ratal commented 6 years ago

Did you install the master last commit in github ? Can you open your file with MDFValidator and check if there are extension blocks for the channels you are interested in ? If there is not, current implementation will fall back to previous behaviour. By the way, you could comment the 4 lines from line 215, it is stripping the device name from the end of signal name, that is probably what you are looking for in the end. Next 2 lines coudl also be commented, splitting with '.'

ajaysgowda commented 6 years ago

i did just that. it works great now. Commenting out those lines works perfectly. thanks!! just out of curiosity, is the reason why you have the split in line 215?

ratal commented 6 years ago

I thought this additionnal devices names were annoying info and generally not allowed characters for other file formats, so I removed it. But it brings uniqueness of name. Maybe I should make optional or better document it.

Fourka11 commented 4 years ago

Hello,

i'm facing exactly the same problem. How did you solve it? Which modules did you use and how did you import them? I'm very new to this. This is my very first attempt at programming. So please bear with me. Thanks Fourka

ratal commented 4 years ago

Can you detail more your issue, maybe we could support you ?

Fourka11 commented 4 years ago

Hi Ratal,

I have a .dat file from INCA and i would like to convert it to a usable format in python to create the plots i need. I imagine something like a GUI. The user should be asked by executing the program which files should be reader. THANKS Fourka

ratal commented 3 years ago

What could be a usable format in python ? You could create a plot by parsing your .dat file in an interactive python (ipython, jupiter, ..)

yop=mdfreader.Mdf('youfile.dat') yop # will display the content of the yop object, its channels, data, description, units, etc. yop.plot('channelname')

you could also convert into pandas dataframes you object with

yop.convert_to_pandas()

mdfconverter is also part of this module which give a GUI to convert your .dat into other file formats like hdf5, Matlab, netcdf. mdfreader could also be included in Veusz advanced GUI but not that easy to setup.

If this is still too complicated with those command lines, you could use asamMdf module that has good and easy GUI

Fourka11 commented 3 years ago

Hi Ratal, THANKS for the quick Response! yop.plot('channelname') works perfect. I want to Display multiple channels from different .dat files in one plot. Is it possible to add or to change channels in an existing plot? Thanks Fourka

ratal commented 3 years ago

.plot() can take nested list of channels as argument, so you can build your plots by grouping them, creating multiplots. But this will be applicable only for one file. If you want to compare data between files, you need to make your own script according to your needs by using .getChannelData(), getChannelUnit(), .getChannelMaster() methods. You can get inspired of how using matplotlib looking at .plot code in mdfreader.py

Fourka11 commented 3 years ago

Thanks for your advice. What does getChannelMaster() deliver? I can't find it in the documentation on mdfreader.py. Thanks Fourka

ratal commented 3 years ago

Sorry, I forgot the '_' .get_channel_master('channelName') will give you the name of the master channel of the given channel in argument. A master channel is most generally time but could also be angle distance, etc. (since mdf4.x). with this channel name you can create you X axis.

Fourka11 commented 3 years ago

I use get_channel_data to yield the channel as numpy array.

datFile = mdf.Mdf('FileName') datFile.mdf.get_channel_data('ChannelName') These two lines work very well. But when it comes to print(ChannelName) i get: Error Name: Name 'ChannelName' is not defined. In order to make the plots i want see at first the numpy array. Could you please help me once again. Thanks Fourka

ratal commented 3 years ago

you should do print(datFile['ChannelName']) import matplotlib.pyplot as plt plt.plot(dataFile.get_channel_data(datFile.get_channel_master('channelName')), dataFile.get_channel_data('channelName'))

Fourka11 commented 3 years ago

THANKS A LOT it works perfect.

shangrilaer commented 2 years ago

this works perfectly thanks.

How do i name the master channels as the rater? is there a parameter i can feed when i call mdfreader.mdf().

I need one last issue resolved for me to be able to move completely from matlab to Python. The data file i have has multiple channels with the same name coming from different sources. Usually the channel name has \'sourceName' appended to the end of it. It did in matlab atleast. is there a way i can make sure that part of the file name is reflected when read MDF data from this library.

I appreciate your help. Thanks

Hello all, I have a problem that the channel name seems have "nmot_w\XCP","rl_w\XCP","latitude\GPS" names . Is there a way to remove them ? using asammdf ,we can use "use_display_name" to remove “\XCP” or df_columns = df.columns.tolist() df_columns_new=[c.replace("\XCP","") for c in df_columns]

is there a API way to do so ?

ratal commented 2 years ago

You could use filter_channel_names parameter but it was rather meant for removing based on '.', customising for your need (example code is in read_cn_block / mdfinfo4) Normally exports should remove this character if not allowed for the format. You could also use .rename_channel(channel_name, new_name) recursively