Error importing MDF4 files with channel names that start with '#' (But only when using a channelList)

mattdreisbach commented 7 years ago

When I import my mdf4 data using yop = mdfreader.mdf(filename) everything works great but it takes a long time to process all the data when I use:

channelList = ['example','#latitude','of','#data']
yop = mdfreader.mdf()
yop.read(fileName=filename, channelList=channelList)

The program fails on the channels that start with '#' symbols here is the traceback:

Traceback (most recent call last):
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\records.py", line 444, in __getattribute__
    res = fielddict[attr][:2]
KeyError: 'latitude'

here is an additional traceback of an exception that occurs during the handeling of the one above,

  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader\mdfreader.py", line 362, in read
    self.read4(self.fileName, info, multiProc, channelList, convertAfterRead, filterChannelNames=False)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader\mdf4reader.py", line 1393, in read4
    temp = buf[recordID]['data'].__getattribute__(recordName)  # extract channel vector
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\records.py", line 446, in __getattribute__
    raise AttributeError("recarray has no attribute %s" % attr)
AttributeError: recarray has no attribute latitude

Once again, this problem does not happen when importing without a channelList and it prints out the proper names when calling yop.keys(). calling:

    info=mdfreader.mdfinfo()
    print(info.listChannels(filename))

produces the expected output with properly formatted channel names

The problem is somehow related to your function: _gen_valid_identifier in mdf.py I was able to implement a workaround by adding

or '#'

to the sanitizer, resulting in the code below (around line 638 of mdf.py):

def _gen_valid_identifier(seq):
    # get an iterator
    itr = iter(seq)
    # pull characters until we get a legal one for first in identifer
    for ch in itr:
        if ch == '_' or '#' or ch.isalpha():
            yield ch
            break
        elif ch.isdigit():
            itr = chain(itr,ch)

    # pull remaining characters and yield legal ones for identifier
    for ch in itr:
        if ch == '_' or '#' or ch.isalpha() or ch.isdigit():
            yield ch

ratal commented 7 years ago

'#' is an allowed identifier character (used by recarray) in python 3.x but not in python 2.7. But if you face like chinese or Japanese, etc. character in channel names, your proposed modification might fail also in python3.x I think. Anyway, this gen_valid_indentifier should not be changed as it just represents worst case limitations of various python versions, the error is coming elsewhere. I could reproduce your issue by changing channel names in my files and I will investigate.

mattdreisbach commented 7 years ago

Thank you! Your software has been an enormous help to me, and I appreciate your efforts to help solve my issue.

ratal commented 7 years ago

Please check latest commit, it should solve this bug.

mattdreisbach commented 7 years ago

I have installed your latest commit and I am having some large issues with it. Problem 1: When I import an mdf file using:

channelList = ['example','#latitude','of','#data']
yop = mdfreader.mdf()
yop.read(fileName=filename, channelList=channelList)

If I try to access the data using: yop.getChannelData(channel) all channels that start with a '#' return a value of 0.0 for all entries. (channels that do not start with '#' do not have this issue)

Problem 2: When I import an mdf file using: yop=mdfreader.mdf(filename) if the mdf file: filename includes channels that start with a '#' the program will error out. below is a full traceback of that error for a file that includes the channel: "#time_Rate1" as well as other channels that start with '#'

Unexpected error: (<class 'ValueError'>, ValueError('no field of name time_Rate1',), <traceback object at 0x0000000005B5F948>)
dataRead crashed, back to python data reading
Traceback (most recent call last):
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf4reader.py", line 1199, in readBitarray
    self[chan].posByteEnd)
ValueError: no field of name time_Rate1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "panel_4_curvature_offset_summary_w_5_plot_w_mfx_support.py", line 230, in <module>
    yop=mdfreader.mdf(filename)#, channelList=columns)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf.py", line 114, in __init__
    self.read(fileName, channelList=channelList, convertAfterRead=convertAfterRead, filterChannelNames=filterChannelNames)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdfreader.py", line 362, in read
    self.read4(self.fileName, info, multiProc, channelList, convertAfterRead, filterChannelNames=False)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf4reader.py", line 1385, in read4
    buf.read(channelSet)  # reads raw data from data block with DATA and DATABlock classes
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf4reader.py", line 347, in read
    self[recordID]['data'] = self.load(record, zip=None, nameList=channelSet, sortedFlag=True)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf4reader.py", line 442, in load
    temps['data'] = record.readSortedRecord(self.fid, self.pointerTodata, channelSet=nameList)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf4reader.py", line 1115, in readSortedRecord
    return self.readBitarray(fid.read(self.CGrecordLength * self.numberOfRecords), channelSet)
  File "C:\Users\420278\AppData\Local\Programs\Python\Python36\lib\site-packages\mdfreader-0.2.4-py3.6-win-amd64.egg\mdfreader\mdf4reader.py", line 1255, in readBitarray
    buf[self[chan].recAttributeName] = asarray(temp)
ValueError: no field of name time_Rate1

any help would be appreciated

ratal commented 7 years ago

I think I fixed it in latest commit, please check ?

mattdreisbach commented 7 years ago

It seems to be working! Thank you for all your efforts. Let me buy you a beer, whats your paypal? also, when will the updated version be available through pip?

ratal commented 7 years ago

Thanks for the proposal, my first donation :) my paypal account is linked to my gmail account: aymeric.rateau@gmail.com Regarding pip, I would like to fix the last regressions I introduced by tackling the python identifier character limitation applied to reacarray (this issue) ; several issues ongoing, once solved I will push new pip version.

ratal / mdfreader

Error importing MDF4 files with channel names that start with '#' (But only when using a channelList) #58