ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

Channel has non regularly increasing master channel time. #148

Closed cristi-neagu closed 4 years ago

cristi-neagu commented 6 years ago

Hello,

I sometimes get this error with certain files. This has nothing to do with MDFReader and it is caused by the recording software messing up the file somehow. Even so, it would be very useful if we had two utilities to deal with this:

  1. A function to detect when this happens, that would return a list of tuples (or a list of dictionaries) with the name of the master channel and the index where the discontinuity occurs.

  2. A utility for slicing a recording, taking the start and end of the slice as inputs, and either modifying the current file or, even better, return a new object containing only the desired data.

I can probably have a go at the first function and make a pull request, but the second one requires a more in depth knowledge of the object structure.

Do you think this is feasible?

cristi-neagu commented 6 years ago

Also, as a side note, when irregular time channels exist in a file, a resampling will fail, creating an empty master channel.

ratal commented 6 years ago

Hi,

  1. I guess with a diff and min/max, you could identify but you will have to give some threshold that has to be parametrisable. I guess your error is coming from interp() when channel is float in resample method. To solve your issue, you could bypass all interp and use searchsorted instead. But your message happens when not all(diff(timevect) > 0), worse than unevenly spaced time sample (error not easy to understand), some samples goes back in time -> another treatment is needed I guess (like you proposed with some kind of cut for instance) or there is issue with mdfreader (worth going in depth for those faulty files, other tools confirm this error ?)
  2. There is .cut method in mdfreader.py that is cutting your complete object. If you want to return sliced data, we could take out the slicing part from .cut in a new method and reuse it in .cut.
cristi-neagu commented 6 years ago
  1. This isn't an issue with mdfreader. Also, in my particular case, the failure mode is that the master channel drops to 0 at the end of the file. So something like np.where(np.diff(masterChannel) < 0) would give out the index where the discontinuity occurs.
  2. I'll have a look into that.
cristi-neagu commented 6 years ago

There is an issue with using the cut function as it is right now. If we want to cut a master channel containing [0, 1, 2, 3, 4, 5, 6, 0, 0] at time 6, searchsorted return an index of 8, which will leave the file unchanged. I think this is because it looks for the condition a(n) < x < a(n+1), which fails when the array isn't sorted. Some other methods needs to be implemented for this particular case. Maybe by being able to optionally specify an index instead of a time.

ratal commented 6 years ago

Maybe using masked arrays could be most easy: data = data.view(MaskedArray) data.mask = np.where(np.diff(masterChannel) < 0)

It is found in apply_invalid_bit, applicable for mdf4, you tried it ?

cristi-neagu commented 6 years ago

Would that allow the file to be resampled?

ratal commented 6 years ago

Interp should not work but it would be possible to use compressed() method to clean up the data before interp and identify mask using masked_where(np.diff(masterChannel) < 0)

ratal commented 6 years ago

Hi, I made a protype in dev branch, adding new function _clean_uneven_master_data You can try it. However, if it works, it will principally targetting your specific case with zero time sample. More generic mask should be considered.

cristi-neagu commented 6 years ago

I will try it, thank you. But it would be interesting to know if anyone has encountered any other failure modes.

cristi-neagu commented 6 years ago

I haven't got around to testing this yet, but it occured to me that we've been going about this the wrong way. Instead of cutting data, the correct solution (for this case, at least) is to rebuild the time vector. The base assumption has to be that the time is constantly increasing. I'm not sure how true that is for everyone else, but for my data that is true in 99% of cases. Find the time step, fill the zeros.

ratal commented 6 years ago

It could be good for you. However, not good to generalise, this behaviour from your recorder could be specific. Did you check if your channel has invalid bits (should be)? Then you could use apply_invalid_bit() to transform array into masked array and the rest should work transparently.

ratal commented 4 years ago

I introduced in dev branch new method resample_group() in order to be more compliant to mdf4 and its various possible master types, it was not making sense to brutely resample all data without considering its type. So I split out the general resample() resampling of one group which has argument new_master_data. You could use it to resample with your own fixed time signal.