Error in marking contacts as invalid

ivoflipse commented 10 years ago

I was just checking my database and it seemed there were only 45 contacts that were invalid or filtered. That seems like awfully few contacts on so many examples, so there must be something wrong.

Also z_touch in touches_edge seems to use min_z, whereas I'd think it is supposed to use max_z (min_z is problematic)

ivoflipse commented 10 years ago

My totally empirical trial shows that 10% is a pretty good cut-off and you could cut people some slack by raising it to ~15%:

Beginning:

End:

FYI This only applies to dogs, humans have a much steeper increase in force.

ivoflipse commented 10 years ago

Also strangely, when calculating min_x etc I subtract the padding everywhere:

if self.padding:
            min_x -= self.padding
            max_x -= self.padding
            min_y -= self.padding
            max_y -= self.padding

Even though I'd assume it gets subtracted from the min and added to the max, in order to 'pad' it. So it would be good to check out what's going on here

ivoflipse commented 10 years ago

Oh right, now I remember. The padding offsets the entire array (because a line is inserted) and we correct for that. So the padding shouldn't be an issue. Carry on, private!

ivoflipse commented 10 years ago

I also suspect that the edge checking isn't up to snuff, so I ran a test and gathered up all the min_x, max_x, min_y and max_y and made histograms out of them:

Seems there are quite a few contacts at the edges, though none at the first or values (0-63 or 0-256), so there seems some padding effect to be going on.

Based on these histograms I would expect about 33 + 17 + 20 + 12 = 82 contacts that touch the edge of the plate (barring any duplicates). Turns out, I found 71, because indeed there were some duplicates.

However, its odd that it never draws anything at the first line or column of the plate, so I suspect there's still something not right

ivoflipse commented 10 years ago

Ah the bug was not in this code, I was simply already filtering out these contacts, hence they were never included. But still, I think the detection could be better, because I'm not sure all these cases were included

ivoflipse commented 10 years ago

This obviously changes the above figures:

Start:

End:

The high peak in the end is not surprising, since the plate can stop measuring at any time. But finding a sensible cut-off is still difficult. Likewise for the start, there seem to be a lot more contacts with slightly higher starting values.

ivoflipse commented 10 years ago

The distribution now also makes a LOT more sense. This is when looking at contacts that touch the edge of the measurement:

Its slightly more distributed when also taking the premature ending into account:

ivoflipse commented 10 years ago

Now I get why the first line and row were empty:

Because all those contacts had already been ditched.

[(0, 109), (1, 36)]
[(62, 20), (63, 70)]
[(0, 153), (1, 20)]
[(255, 17), (256, 101)]

So now I guess my edge touching is a tad too sensitive, since its including contacts that are just off one value as well.

Its also funny how the distribution peaks at the edge, because the tail of the distribution gets cut-off, because otherwise its pretty normally distributed.

ivoflipse commented 10 years ago

I've also enumerated the last frames for each contact:

Again, interesting effect due to the measurement being cut-off. Also, while it looks like the last value is 249, its actually 248: [(248, 354), (249, 0)]

This almost makes it suspect that I'm not using data in the last frame... But the last value I can index is measurement["data"][255,62,248], so how on earth can I get 256 and 63 for max_x and max_y?

I suspect its due to Python indexing, where I can slice up to max_x and max_x, whereas max_z isn't determined from a slice, but from a list of frames. So the last frame is actually the last accessible index. Given that I use the data that's being stored, this difference in approaches remains in my script as well. So the max_z should be equal to the shape - 1.

For anyone who thinks this sounds stupid, remember that my contacts are NOT just a plain Numpy array, because that would create problems when bounding boxes overlap in any way. Instead its a dictionary of frames that hold polygons that contain all the sensors for that contact.

Perhaps it would be even better to just let the incomplete step detection deal with this, instead of fiddling with this awkward definition of the end of a measurement.

ivoflipse commented 10 years ago

That gives the following distribution of incomplete/invalid contacts:

ivoflipse commented 10 years ago

Anyway, here's my new version for touches_edges:

def touches_edge(min_x, max_x, min_y, max_y, max_z):
    """
    Checks if the x, y and z dimensions don't hit the outer dimensions of the plate.
    Could be refined to require multiple sensors to touch the edge in order to invalidate a contact.
    """
    nx, ny, nt = measurement.shape  # 256, 63, 249
    x_touch = (min_x == 0) or (max_x == nx)
    y_touch = (min_y == 0) or (max_y == ny)
    z_touch = (max_z == nt-1)
    return x_touch or y_touch or z_touch

Though like I said, perhaps we should ditch z_touch and deal with that separately.

ivoflipse commented 10 years ago

I just made this change in my development version, so we'll see how it plays out on real data

ivoflipse commented 10 years ago

I still think its strange z_touch has to use nt-1. I wonder whether its because of errors in my functions that detect or convert the contacts, where the last frame simply wasn't used when I created the dataset, so no contact would satisfy this condition.

Either way, very fishy!

ivoflipse commented 10 years ago

I've not split everything up into whether it

touches the edge (physically)
isn't finished (still on the plate at the end of the measurement)
incomplete (force above threshold)

Where the first two set a contact to be invalid or filtered.

The last frame check seems to have different issues, where currently the last frame for all contacts seems to be 246 instead of 249. I'll have to delve into the code a bit deeper to figure out where this started to happen.

ivoflipse commented 10 years ago

Turns out I was yet again making an error that caused the last frame not to be taken into account.

I'm adding a test now to catch it, then I can verify whether my check really shouldn't be max_z == nt

ivoflipse commented 10 years ago

Now that I am sure the loading is correct, you still have to use max_z >= nt-1, probably because 0 up to 248 has a length of 249, but there's no frame 249.

Anyway, now that it has been split up, I'm closing this. If somebody has any suggestions for more tests, either create a new issue or add a comment to this one.

ivoflipse / Pawlabeling

Error in marking contacts as invalid #130