ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

Issues with new interpolate function in mdf.resample() #106

Closed cristi-neagu closed 6 years ago

cristi-neagu commented 6 years ago

Looks like the arguments to interpolate are mixed up. Currently is:

def interpolate(x, y, new_x):
    if y.dtype.kind == 'f':
        return interp(x, y, new_x)
    else:
        idx = searchsorted(x, new_x, side='right')
        idx -= 1
        idx = clip(idx, 0, idx[-1])
        return x[idx]

Should be:

def interpolate(new_x, x, y):
    if y.dtype.kind == 'f':
        return interp(new_x, x, y)
    else:
        idx = searchsorted(x, new_x, side='right')
        idx -= 1
        idx = clip(idx, 0, idx[-1])
        return y[idx]

Because the function checks for the type of y, which is passed to the function as the current time channel and will always be a float, all channels are interpolated as floats, because the arguments to interp are also mixed up. But as it stands, no integer channels get the new searchsorted treatment.

ratal commented 6 years ago

Oups, a bit stupid and released. I should I have checked more deeply. Corrected in last commit but I will try to release a new version soon to have minimum impact.

jayveesea commented 6 years ago

also having re-sample problems in v2.7.

I'm also curious about one of the recent changes... "Resample method: interpolation only for floating channels, not for integers." ...will the integers still get resampled?

cristi-neagu commented 6 years ago

will the integers still get resampled?

As it stands, no, they won't. In general i found that integer signals should not be interpolated, because they contain bit encoded data in most cases. As such, the value between 43 and 59, for example, should not be 51. Channels that make sense when being interpolated are usually floats.

In order to introduce flexibility, some sort of channel filter list would have to be introduced to tell the interpolate function to treat that channel differently. That would involve a substantial change. Another thing you could do before resampling a file is this:

intChannelsForInterp = ['channel1', 'channel2'....]
for channel in intChannelsForInterp:
    mdf.setChannelData(channel, mdf.getChannelData(channel).astype(float))

That will ensure those integer channels will be interpolated as floats.

Hope this helps.

jayveesea commented 6 years ago

Thanks for the feedback. I completely agree that the integers should not be interpolated. But for re-sampling, in the past, I have held the previous value until a change is detected. For instance, in the simple case: t={0, 0.5, 1}, sig={0, 2, 2}

then: t_new={0, 0.25, 0.5, 0.75, 1}, sig={0, 0, 2, 2, 2}

...that should work reasonably well for time rasters that are multiples of each other, but in the case where they are not I think it's still a reasonable approach. For instance, in this new case: t={0, 0.25, 0.5, 0.75, 1}, sig={0, 2, 2, 2, 0}

then: t_new={0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}, sig={0, 0, 0, 2, 2, 2, 2, 2, 0, 0, 0}

I've also seen the "nearest neighbor" method used or perhaps NaNs could be used in the uncertain areas. The problem I'll have if the integers are not re-sampled is post processing data using the signals that are integers (logical indexing for example).

This is just my preference (and two cents), in the end it may be best for me to do my own interpolation of the integers.

cristi-neagu commented 6 years ago

The integer interpolation function used returns the last value, as in your first example.

ratal commented 6 years ago

Hi coppolajj, Explanation was not so clear in the commit but as said Criti, the searchsorted() will act as 'nearest neighbor', only for non floating vectors (could be text vectors as well, especially with mdf4)

jayveesea commented 6 years ago

ahhh, this is perfect, thanks for clearing up.

ratal commented 6 years ago

ok then I close this issue

cristi-neagu commented 6 years ago

It looks like this never made it into the release. Current mdfreader.py still treats all channels as floats, because it looks at the time channel for dtype, instead of the channel data.

ratal commented 6 years ago

I missed something with git, should be correct now ?

cristi-neagu commented 6 years ago

Line 564 is wrong. Should read return y[idx].

cristi-neagu commented 6 years ago

It seems that a fix is in place for all of this. Closing.