Additional motion detection enhancements

MV10 commented 3 years ago

Hey Ian, some interesting news... back in #161 the motion detection processing improved a lot -- my test setup was getting 25ms per pass, about 255 frames over a 30 second run. I've managed to more than double that ... about 10.5ms per pass, and more than 700 frames over a 30 second run.

I'm actually setting this up as a prelude to attempt fine-grained-cell proximity motion detection (e.g. "ignore my small dog" mode), but these changes are sufficiently substantial to warrant a PR of their own.

The crazy part is that the changes are simple. The largest change is to simply pre-calculate the Rectangle objects once along with a few other one-time optimizations that the existing 0.7 build does repeatedly on the fly. Apparently even struct allocs can be expensive. Nobody really needs 10ms motion detection (probably), so my hope is that this headspace will allow for more complex analysis. Also I want to investigate the use of Parallel.For versus arrays of tasks -- in .NET Core they made good progress generalizing task parallelism but Parallel.For is still theoretically optimal for CPU-bound operations like these.

It's quite late and I need to turn in, but I have a few other things in mind before I PR this. But the improvement was so substantial I thought I'd open a new discussion as a heads up.

public class ProximityDiffAnalyser : FrameAnalyser
{
    // When true, PrepareTestFrame does additional start-up processing
    private bool _firstFrame = true;

    // Frame dimensions collected when the first full frame is complete
    private int _frameWidth;
    private int _frameHeight;
    private int _frameStride;
    private int _frameBpp;

    private byte[] _mask;
    private Stopwatch _testFrameAge;

    // new stuff for dev/test
    public int _cellDivisor = 2;
    private int _cellCount;
    private int _cellWidth;
    private int _cellHeight;
    private Rectangle[] _cellRect;
    private Task[] _diffTask;

    // performance testing
    public long ElapsedTime;
    public long Passes;

    internal Action OnDetect { get; set; }

    /// <summary>
    ///  This is the image we are comparing against new incoming frames.
    /// </summary>
    protected byte[] TestFrame { get; set; }

    /// <summary>
    /// Indicates whether we have a full test frame.
    /// </summary>
    protected bool FullTestFrame { get; set; }

    /// <summary>
    /// The motion configuration object.
    /// </summary>
    protected MotionConfig MotionConfig { get; set; }

    /// <summary>
    /// The image metadata.
    /// </summary>
    protected ImageContext ImageContext { get; set; }

    /// <summary>
    /// Creates a new instance of <see cref="ProximityDiffAnalyser"/>.
    /// </summary>
    /// <param name="config">The motion configuration object.</param>
    /// <param name="onDetect">A callback when changes are detected.</param>
    public ProximityDiffAnalyser(MotionConfig config, Action onDetect)
    {
        this.MotionConfig = config;
        this.OnDetect = onDetect;

        _testFrameAge = new Stopwatch();
    }

    /// <inheritdoc />
    public override void Apply(ImageContext context)
    {
        this.ImageContext = context;

        base.Apply(context);

        if (!this.FullTestFrame)
        {
            if (context.Eos)
            {
                this.FullTestFrame = true;
                this.PrepareTestFrame();
                MMALLog.Logger.LogDebug("EOS reached for test frame.");
            }
        }
        else
        {
            MMALLog.Logger.LogDebug("Have full test frame.");

            if (this.FullFrame && !this.TestFrameExpired())
            {
                MMALLog.Logger.LogDebug("Have full frame, checking for changes.");

                this.CheckForChanges(this.OnDetect);
            }
        }
    }

    /// <summary>
    /// Resets the test and working frames this analyser is using.
    /// </summary>
    public void ResetAnalyser()
    {
        this.TestFrame = null;
        this.WorkingData = new List<byte>();
        this.FullFrame = false;
        this.FullTestFrame = false;

        _testFrameAge.Reset();
    }

    private void PrepareTestFrame()
    {
        if (_firstFrame)
        {
            // one-time collection of basic frame dimensions
            _frameWidth = this.ImageContext.Resolution.Width;
            _frameHeight = this.ImageContext.Resolution.Height;
            _frameBpp = this.GetBpp() / 8;
            _frameStride = this.ImageContext.Stride;

            // one-time setup of the diff cell parameters and arrays
            _cellCount = (int)Math.Pow(_cellDivisor, 2);
            _cellWidth = _frameWidth / _cellDivisor;
            _cellHeight = _frameHeight / _cellDivisor;
            _cellRect = new Rectangle[_cellCount];
            _diffTask = new Task[_cellCount];

            int i = 0;
            for (int row = 0; row < _cellDivisor; row++)
            {
                int y = row * _cellHeight;
                for (int col = 0; col < _cellDivisor; col++)
                {
                    int x = col * _cellWidth;
                    _cellRect[i] = new Rectangle(x, y, _cellWidth, _cellHeight);
                    i++;
                }
            }

            this.TestFrame = this.WorkingData.ToArray();

            if (!string.IsNullOrWhiteSpace(this.MotionConfig.MotionMaskPathname))
            {
                this.PrepareMask();
            }

            _firstFrame = false;
        }
        else
        {
            this.TestFrame = this.WorkingData.ToArray();
        }

        if (this.MotionConfig.TestFrameInterval != TimeSpan.Zero)
        {
            _testFrameAge.Restart();
        }
    }

    private int GetBpp()
    {
        PixelFormat format = default;

        // RGB16 doesn't appear to be supported by GDI?
        if (this.ImageContext.PixelFormat == MMALEncoding.RGB24)
        {
            return 24;
        }

        if (this.ImageContext.PixelFormat == MMALEncoding.RGB32 || this.ImageContext.PixelFormat == MMALEncoding.RGBA)
        {
            return 32;
        }

        if (format == default)
        {
            throw new Exception("Unsupported pixel format.");
        }

        return 0;
    }

    private void PrepareMask()
    {
        using (var fs = new FileStream(this.MotionConfig.MotionMaskPathname, FileMode.Open, FileAccess.Read))
        using (var mask = new Bitmap(fs))
        {
            // Verify it matches our frame dimensions
            var maskBpp = Image.GetPixelFormatSize(mask.PixelFormat) / 8;
            if (mask.Width != _frameWidth || mask.Height != _frameHeight || maskBpp != _frameBpp)
            {
                throw new Exception("Motion-detection mask must match raw stream width, height, and format (bits per pixel)");
            }

            // Store the byte array
            BitmapData bmpData = null;
            try
            {
                bmpData = mask.LockBits(new Rectangle(0, 0, mask.Width, mask.Height), ImageLockMode.ReadOnly, mask.PixelFormat);
                var pNative = bmpData.Scan0;
                int size = bmpData.Stride * mask.Height;
                _mask = new byte[size];
                Marshal.Copy(pNative, _mask, 0, size);
            }
            finally
            {
                mask.UnlockBits(bmpData);
            }
        }
    }

    private bool TestFrameExpired()
    {
        if (this.MotionConfig.TestFrameInterval == TimeSpan.Zero || _testFrameAge.Elapsed < this.MotionConfig.TestFrameInterval)
        {
            return false;
        }

        MMALLog.Logger.LogDebug("Have full frame, updating test frame.");
        this.PrepareTestFrame();
        return true;
    }

    private void CheckForChanges(Action onDetect)
    {
        Passes++;
        var sw = new Stopwatch();
        sw.Start();

        var diff = this.Analyse();

        sw.Stop();
        ElapsedTime += sw.ElapsedMilliseconds;

        if (diff >= this.MotionConfig.Threshold)
        {
            MMALLog.Logger.LogInformation($"Motion detected! Frame difference {diff}.");
            onDetect();
        }
    }

    private int Analyse()
    {
        var currentBytes = this.WorkingData.ToArray();

        int diff = 0;

        for (int i = 0; i < _cellCount; i++)
        {
            var capture = i;
            _diffTask[i] = Task.Run(() =>
            {
                diff += this.CheckDiff(_cellRect[capture], currentBytes);
            });
        }

        Task.WaitAll(_diffTask);

        return diff;
    }

    private int CheckDiff(Rectangle cell, byte[] currentFrame)
    {
        int diff = 0;

        for (int col = cell.X; col < cell.X + cell.Width; col++)
        {
            for (int row = cell.Y; row < cell.Y + cell.Height; row++)
            {
                var index = (col * _frameBpp) + (row * _frameStride);

                if (_mask != null)
                {
                    var rgbMask = _mask[index] + _mask[index + 1] + _mask[index + 2];

                    if (rgbMask == 0)
                    {
                        continue;
                    }
                }

                var rgb1 = TestFrame[index] + TestFrame[index + 1] + TestFrame[index + 2];

                var rgb2 = currentFrame[index] + currentFrame[index + 1] + currentFrame[index + 2];

                if (rgb2 - rgb1 > MotionConfig.Threshold)
                {
                    diff++;
                }

                // If the threshold has been exceeded, we want to exit from this method immediately for performance reasons.
                if (diff > MotionConfig.Threshold)
                {
                    return diff;
                }
            }

            if (diff > MotionConfig.Threshold)
            {
                return diff;
            }
        }

        return diff;
    }
}

MV10 commented 3 years ago

That hard-coded _cellDivisor variable near the top is the X/Y split, so 2 is the same as the quads the existing motion capture code uses. Interestingly, changing that to 4, which gives us 16 cells, actually runs very slightly faster, at around 9.3ms per pass. Increasing to 8 for 64 cells averages 9.6ms.

With 64 cells on a 640x480 image each cell is just 80x60 and that should be more than fine enough to reject tiny localized movement and perhaps implement proximity-based triggering.

After dinner I'll test whether Parallel.ForEach buys us anything useful.

MV10 commented 3 years ago

Since Parallel loops shouldn't modify shared data, I created a simple private struct that contains the diff int as well as the rectangle. Interestingly, in the 64-cell grid, that very slightly out-performs the code shown earlier at 9.3ms per pass.

With 64 cells, Parallel.ForEach averages about 9ms (the range I see is 8.7 to 9.1), so it does perform very slightly better. That's with no motion so it also includes a quick loop to add up the diffs per cell. If a single cell meets the threshold, it will short-circuit the entire parallel loop (setting Stop on the loop state will also preempt any threads that haven't started yet), so response to motion should be even faster. We're actually teetering on the edge where thread context switching is more costly than parallel processing benefits. I switched back to quads (split 2) and got something like 0.0x better timings on average so this has pretty much pushed the processing as far as it can go.

The interesting parts are the struct and the loop changes:

private struct DiffRect
{
    internal int diff;
    internal Rectangle rect;
}

private int Analyse()
{
    _workingData = this.WorkingData.ToArray();

    var result = Parallel.ForEach(_cellRect, (src, loopState) => CheckDiff(src, loopState));

    if (!result.IsCompleted && !result.LowestBreakIteration.HasValue)
    {
        return int.MaxValue; // loop was stopped, so return a large diff
    }
    else
    {
        int diff = 0;
        foreach (var cell in _cellRect)
            diff += cell.diff;
        return diff;
    }
}

private void CheckDiff(DiffRect cell, ParallelLoopState loopState)
{
    cell.diff = 0;
    var rect = cell.rect;

    // looping and diff math omitted

            // If the threshold has been exceeded, we want to exit from this method immediately for performance reasons.
            if (cell.diff > MotionConfig.Threshold)
            {
                loopState.Stop();
                return;
            }
        }

        if (cell.diff > MotionConfig.Threshold)
        {
            loopState.Stop();
            return;
        }
    }
}

MV10 commented 3 years ago

Since I had it on the clipboard (my wife asked what the heck I'm doing)...

MV10 commented 3 years ago

This is interesting. Setting the divisor to 10 or 16 (100 or 256 cells) made performance slightly worse.

But I didn't expect this: cranking it up to 1024 cells (divisor 32) still shows small perf gains ... I'm seeing around 7.7ms per pass, and that's very consistent. Those would be 20x15 cells at 640x480. There must be some tradeoff from the smaller cell size. There isn't a good integer divisor larger than that which works evenly with 640 and 480 (e.g. you'd start losing pixels from roundoff) so that's about as good as it gets with this approach.

Under 8ms is pretty good. I also realized the passes are always 707 or 708 because now it's just limited by the configured frame rate. 30 seconds of motion detection at that speed would be enough to actually process nearly 3900 frames, which is almost 130 FPS -- far more than the Pi can do. Neat.

MV10 commented 3 years ago

So my eventual goal was to attempt motion detection using only values in close proximity. I had a bug in my implementation in that I was updating a struct field -- the motion detection events were from calling Stop on the parallel processing. I changed it so that it wouldn't exit early (so that I could analyze each cell diff) -- and it stopped working. Mutating struct field state did nothing (the structs were copies), all cell diffs were zero after the parallel loop.

So I switched to a pair of simple independent arrays (which are thread-safe if threads are reading/writing specific, non-overlapping indices) and ... it got faster. 😀 So we're down in the 6.7ms range now at 1024 cells.

I'll update the PR with the changes.

techyian / MMALSharp

Additional motion detection enhancements #170