Closed MV10 closed 3 years ago
That hard-coded _cellDivisor
variable near the top is the X/Y split, so 2 is the same as the quads the existing motion capture code uses. Interestingly, changing that to 4, which gives us 16 cells, actually runs very slightly faster, at around 9.3ms per pass. Increasing to 8 for 64 cells averages 9.6ms.
With 64 cells on a 640x480 image each cell is just 80x60 and that should be more than fine enough to reject tiny localized movement and perhaps implement proximity-based triggering.
After dinner I'll test whether Parallel.ForEach
buys us anything useful.
Since Parallel
loops shouldn't modify shared data, I created a simple private struct that contains the diff int as well as the rectangle. Interestingly, in the 64-cell grid, that very slightly out-performs the code shown earlier at 9.3ms per pass.
With 64 cells, Parallel.ForEach
averages about 9ms (the range I see is 8.7 to 9.1), so it does perform very slightly better. That's with no motion so it also includes a quick loop to add up the diffs per cell. If a single cell meets the threshold, it will short-circuit the entire parallel loop (setting Stop
on the loop state will also preempt any threads that haven't started yet), so response to motion should be even faster. We're actually teetering on the edge where thread context switching is more costly than parallel processing benefits. I switched back to quads (split 2) and got something like 0.0x better timings on average so this has pretty much pushed the processing as far as it can go.
The interesting parts are the struct and the loop changes:
private struct DiffRect
{
internal int diff;
internal Rectangle rect;
}
private int Analyse()
{
_workingData = this.WorkingData.ToArray();
var result = Parallel.ForEach(_cellRect, (src, loopState) => CheckDiff(src, loopState));
if (!result.IsCompleted && !result.LowestBreakIteration.HasValue)
{
return int.MaxValue; // loop was stopped, so return a large diff
}
else
{
int diff = 0;
foreach (var cell in _cellRect)
diff += cell.diff;
return diff;
}
}
private void CheckDiff(DiffRect cell, ParallelLoopState loopState)
{
cell.diff = 0;
var rect = cell.rect;
// looping and diff math omitted
// If the threshold has been exceeded, we want to exit from this method immediately for performance reasons.
if (cell.diff > MotionConfig.Threshold)
{
loopState.Stop();
return;
}
}
if (cell.diff > MotionConfig.Threshold)
{
loopState.Stop();
return;
}
}
}
Since I had it on the clipboard (my wife asked what the heck I'm doing)...
This is interesting. Setting the divisor to 10 or 16 (100 or 256 cells) made performance slightly worse.
But I didn't expect this: cranking it up to 1024 cells (divisor 32) still shows small perf gains ... I'm seeing around 7.7ms per pass, and that's very consistent. Those would be 20x15 cells at 640x480. There must be some tradeoff from the smaller cell size. There isn't a good integer divisor larger than that which works evenly with 640 and 480 (e.g. you'd start losing pixels from roundoff) so that's about as good as it gets with this approach.
Under 8ms is pretty good. I also realized the passes are always 707 or 708 because now it's just limited by the configured frame rate. 30 seconds of motion detection at that speed would be enough to actually process nearly 3900 frames, which is almost 130 FPS -- far more than the Pi can do. Neat.
So my eventual goal was to attempt motion detection using only values in close proximity. I had a bug in my implementation in that I was updating a struct field -- the motion detection events were from calling Stop
on the parallel processing. I changed it so that it wouldn't exit early (so that I could analyze each cell diff) -- and it stopped working. Mutating struct field state did nothing (the structs were copies), all cell diffs were zero after the parallel loop.
So I switched to a pair of simple independent arrays (which are thread-safe if threads are reading/writing specific, non-overlapping indices) and ... it got faster. 😀 So we're down in the 6.7ms range now at 1024 cells.
I'll update the PR with the changes.
Hey Ian, some interesting news... back in #161 the motion detection processing improved a lot -- my test setup was getting 25ms per pass, about 255 frames over a 30 second run. I've managed to more than double that ... about 10.5ms per pass, and more than 700 frames over a 30 second run.
I'm actually setting this up as a prelude to attempt fine-grained-cell proximity motion detection (e.g. "ignore my small dog" mode), but these changes are sufficiently substantial to warrant a PR of their own.
The crazy part is that the changes are simple. The largest change is to simply pre-calculate the
Rectangle
objects once along with a few other one-time optimizations that the existing 0.7 build does repeatedly on the fly. Apparently even struct allocs can be expensive. Nobody really needs 10ms motion detection (probably), so my hope is that this headspace will allow for more complex analysis. Also I want to investigate the use ofParallel.For
versus arrays of tasks -- in .NET Core they made good progress generalizing task parallelism butParallel.For
is still theoretically optimal for CPU-bound operations like these.It's quite late and I need to turn in, but I have a few other things in mind before I PR this. But the improvement was so substantial I thought I'd open a new discussion as a heads up.