techyian / MMALSharp

C# wrapper to Broadcom's MMAL with an API to the Raspberry Pi camera.
MIT License
195 stars 33 forks source link

FrameDiffAnalyser enhancements #162

Closed MV10 closed 3 years ago

MV10 commented 3 years ago

Fixes #161

These are the changes discussed in #161 -- I noted it's a fairly extensive (but simple) overhaul, so I'll walk through all the details of anything which changed, top to bottom. More to explain my thinking, none of the code is complicated.

MotionConfig There is one new string property, MotionMaskPathname, which is optional. It must be a BMP that matches the raw stream width, height, and color depth. Black pixels disable motion detection.

FrameDiffAnalyser Several private fields have been added. _firstFrame starts out true and controls some basic chores once the first full frame has been collected. Instead of constantly pulling the bitmap dimensions, I just store them once, they'll never change for the raw motion detection stream (this actually had a measurable difference). Finally there is the _mask, and like the TestFrame we only store the byte[] data.

Apply and PrepareTestFrame Now the class uses the underlying FrameAnalyser buffer exclusively. The changes there are pretty self-explanatory. PrepareTestFrame does all the work now. The call to GetBlankBitmap is the format-based stuff that used to be LoadBitmap -- in this class the stream version at the end was never used, so the only thing the method ever did was returned a new empty bitmap, so I just renamed it for clarity. If this is the first frame, it stores dimensions, then stores the frame's bytes to the TestFrame property via the new ProcessWorkingData method, and if a mask is defined, PrepareMask does a one-time setup of that. On subsequent calls, it just stores the bytes.

PrepareMask Pretty simple. It loads the mask file, ensures the dimensions match, then stores the bytes to an array. I originally did this in the constructor, but I dislike putting operations like file I/O into a constructor, and I had other on-the-first-frame things to do anyway.

CheckForChanges As mentioned in the issue, the extra code that was here (which looked like an older motion detection WIP) was unnecessary, so now this is just a call to Analyse and responding if the difference threshold is exceeded.

ProcessWorkingData Since this revision is only ever concerned with managed byte arrays, that's basically what this does. There is an oddity here I don't understand -- you can see it calls Marshal.Copy twice -- once to put the WorkingData array into the bitmap buffer, then again to copy the bitmap buffer to the array that is returned. I don't understand why, but the WorkingData array contents are not the same as the array read back out of the Bitmap. It works, and I'm hoping you can tell me what sort of witchcraft is at work there, I wouldn't have expected a memory copy to have side-effects, and not doing this definitely does not work. (That also answers the question I had in the related issue -- I wondered if conversion to a Bitmap was necessary at all, and the answer is apparently Yes.)

Analyse Finally we reach the new slim, trim Analyse class. It's self-explanatory -- we set up the quads the same way (but using the stored dimension fields), the current frame bytes are pulled from ProcessWorkingData, then we run the CheckDiff calls -- which only need the quad and the current frame bytes, everything else is class-level fields / props. No locking, no streams, etc.

CheckDiff No need for unsafe, this runs completely against managed arrays. We calculate the index (same for all the arrays) then check the _mask array first. If the pixel is black, we continue. Otherwise it's the same algorithm to compare the test frame against the current frame. I got rid of some if blocks that didn't do anything.

Masking works, I tested by loading one that was half black, half white, which made it easy to see when motion was triggered as I moved my hand across the field of view. So, nothing too complicated or crazy, almost more of a cleanup pass really, apart from switching to all managed arrays.

This has been quite entertaining.

MV10 commented 3 years ago

Oh I should also mention all my timings in that issue thread were using debug builds. I imagine release builds would show even more improvement.

techyian commented 3 years ago

Hi Jon,

I really like what you've done here. I should have realised during writing this code that it could be hugely simplified with managed arrays instead but never mind, that's what happens when you're learning new stuff!

It works, and I'm hoping you can tell me what sort of witchcraft is at work there, I wouldn't have expected a memory copy to have side-effects, and not doing this definitely does not work.

You absolutely do not need to be worrying about creating that blank bitmap and doing copies, so you were right to question it. What I've done in my local copy of your branch is the following:

1) Commented out the ProcessWorkingData method. 2) Added a new int property to the ImageContext class called "Stride". 3) In PortCallbackHandler.cs where the ImageContext is constructed, I've added: Stride = MMALUtil.mmal_encoding_width_to_stride(WorkingPort.PixelFormat?.EncodingVal ?? this.WorkingPort.EncodingType.EncodingVal, this.WorkingPort.Resolution.Width). This requires you to modify MMALUtil.mmal_encoding_width_to_stride to accept int types and also return an int. The native MMAL headers accept and return uint types and initially I followed this convention, but to be honest I've never encountered a situation where a value returned is greater than Int32.MaxValue so I think you should be OK there - it would need to be one hell of an image to have a stride that big! 4) Fixed the errors that occur as a result of the above.

I haven't done lots of testing, but my application does detect motion following these changes. Was it the stride value that you were struggling with? If not, what exactly wasn't working for you when you tried? I haven't played around with a mask image yet.

I will continue to look over the PR over the next few days, but initial feelings are very positive, so thanks again.

MV10 commented 3 years ago

I should have clarified, the only testing I did regarding not using Bitmap was to compare the WorkingData array with the data copied back out of the Bitmap and it was different, unless I made some sort of mistake. But that's really interesting if it isn't needed, should be another big performance gain. Locking the bitmap is expensive, under the hood .NET is actually copying the entire thing to memory that it locks in advance -- in other words, it doesn't actually lock the "real" bitmap, which I found odd / interesting.

techyian commented 3 years ago

Yes, that is interesting. It's pretty well known now that the System.Drawing API is really inefficient so it's definitely right to be removing it where possible.

MV10 commented 3 years ago

I wondered if there might be some byte-ordering or big/little-endian voodoo going on somewhere.

As for GDI+ performance, I know by reputation that SkiaSharp is supposed to be a lot faster. Might be interesting to explore that at some point. Non-trivial undertaking though, I know. (As my personal motto goes, "All ya gotta do is..." which is what my car buddies always seem to say about any problem that arises out in the garage.)

I guess I've never thought about it before, but is it possible for more than one person to push changes to an open PR?

techyian commented 3 years ago

Yes, I think you can which is handy going forward. I have been quite reluctant to add a hard dependency onto ImageSharp/SkiaSharp, if there are clear benefits then I may consider it but if we can get by without doing so I'd rather keep it that way for now.

MV10 commented 3 years ago

On the question of contributing wiki changes, there is an easier way:

https://stackoverflow.com/questions/7197937/on-github-can-i-fork-just-a-wiki/56480628#56480628

It works, now your wiki is in my fork too (locally it's in a separate repo / folder, but that's no big deal):

https://github.com/MV10/MMALSharp/wiki

MV10 commented 3 years ago

Looking at the wiki motion topic, I'm reminded of a question I had -- what is the valid range for Threshold? Now that I've seen the code I think it's probably 0 - 1020, right? Four quads added, which each return a 0 - 255 diff value?

MV10 commented 3 years ago

Hmm, so that workflow is still pretty wonky in practice. Here's the updated sample code for the motion entry in the wiki. I think as repo owner you can edit this post and grab the markdown directly:

public async Task DetectMotion()
{
    MMALCamera cam = MMALCamera.Instance;

    // When using H.264 encoding we require key frames to be generated for the Circular buffer capture handler.
    MMALCameraConfig.InlineHeaders = true;

    // Two capture handlers are being used here, one for motion detection and the other to record a H.264 stream.
    // We will not record the raw stream (which would be very large and probably not useful).
    using (var vidCaptureHandler = new CircularBufferCaptureHandler(4000000, "/home/pi/videos/detections", "h264"))
    using (var motionCircularBufferCaptureHandler = new CircularBufferCaptureHandler(4000000))
    using (var splitter = new MMALSplitterComponent())
    using (var resizer = new MMALIspComponent())
    using (var vidEncoder = new MMALVideoEncoder())
    using (var renderer = new MMALVideoRenderer())
    {
        cam.ConfigureCameraSettings();

        var splitterPortConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420);
        var vidEncoderPortConfig = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, bitrate: MMALVideoEncoder.MaxBitrateLevel4);

        // The ISP resizer is being used for better performance. Frame difference motion detection will only work if using raw video data. Do not encode to H.264/MJPEG.
        // Resizing to a smaller image may improve performance, but ensure that the width/height are multiples of 32 and 16 respectively to avoid cropping.
        var resizerPortConfig = new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480);

        splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
        splitter.ConfigureOutputPort(0, splitterPortConfig, null);
        splitter.ConfigureOutputPort(1, splitterPortConfig, null);

        resizer.ConfigureOutputPort<VideoPort>(0, resizerPortConfig, motionCircularBufferCaptureHandler);

        vidEncoder.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), splitter.Outputs[1], null);
        vidEncoder.ConfigureOutputPort(vidEncoderPortConfig, vidCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(splitter);
        cam.Camera.PreviewPort.ConnectTo(renderer);

        splitter.Outputs[0].ConnectTo(resizer);
        splitter.Outputs[1].ConnectTo(vidEncoder);

        // Camera warm up time
        await Task.Delay(2000);

        var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));

        // Here we are instructing the capture handler to use a difference threshold of 130. Lower values
        // indicate higher sensitivity. Suitable range for indoor detection between 120-150 with stable lighting
        // conditions. The testFrameInterval argument updates the test frame (which is compared to each new frame).
        var motionConfig = new MotionConfig(threshold: sensitivity, testFrameInterval: TimeSpan.FromSeconds(3));

        await cam.WithMotionDetection(motionCircularBufferCaptureHandler, motionConfig,
        async () =>
        {
            // This callback will be invoked when motion has been detected.

            // Stop motion detection while we are recording.
            motionCircularBufferCaptureHandler.DisableMotionDetection();

            // This will control the duration of the recording.
            var ctsStopRecording = new CancellationTokenSource();

            // This will be invoked when the token is canceled.
            ctsStopRecording.Token.Register(() =>
            {
                // We want to re-enable the motion detection.
                motionCircularBufferCaptureHandler.EnableMotionDetection();

                // Stop recording on our capture handler.
                vidCaptureHandler.StopRecording();

                // Create a new file for our next recording run.
                vidCaptureHandler.Split();
            });

            // Record for 10 seconds
            ctsStopRecording.CancelAfter(10 * 1000);

            // Record until the duration passes or the overall motion detection token expires. Passing
            // vidEncoder.RequestIFrame to StartRecording initializes the clip with a key frame just
            // as the capture handler begins recording.
            await Task.WhenAny(
                vidCaptureHandler.StartRecording(vidEncoder.RequestIFrame, ctsStopRecording.Token),
                cts.Token.AsTask()
            );

            // Stop the recording if the overall motion detection token expired
            if (!recordingCTS.IsCancellationRequested) 
            {
                recordingCTS.Cancel();
            }
        })
        .ProcessAsync(cam.Camera.VideoPort, cts.Token);
    }

    // Only call when you no longer require the camera, i.e. on app shutdown.
    cam.Cleanup();
}
techyian commented 3 years ago

Looking at the wiki motion topic, I'm reminded of a question I had -- what is the valid range for Threshold? Now that I've seen the code I think it's probably 0 - 1020, right? Four quads added, which each return a 0 - 255 diff value?

No, I don't believe so. Each CheckDiff call loops around the width & height of a specified quad and, given we disregard if (rgb2 - rgb1 > MotionConfig.Threshold) for now, if each pixel in the TestFrame is different to the currentFrame then the return value is going to be width(quad) height(quad). I would say the theoretical maximum of Threshold would be (width(image) height(image)) / 4, although that value would not be practical at all. What may be causing some confusion is that MotionConfig.Threshold is being used in CheckDiff, but also checked against the total diff value in CheckForChanges which might not be what we want?

Hmm, so that workflow is still pretty wonky in practice.

What exactly don't you like here? I know it's quite verbose, possibly splitting the lambda into its own method would be cleaner for the wiki example? There's not much you can hide from the user internally because they have to manage the Disabling/Enabling of motion detection via the raw stream capture handler.

MV10 commented 3 years ago

Oh, right ... so the upper bound of Threshold is some gigantic number that's probably irrelevant.

I meant the wiki update/PR processes were wonky. I edit MD files in VS and the git integration doesn't like either process. It thinks my changes are up-to-date with the remote as soon as I save the file locally. And it does something to the .git data so that command-line git doesn't think there are changes, either.

No problems with the examples or the API usage. 👍🏻

techyian commented 3 years ago

Oh, right ... so the upper bound of Threshold is some gigantic number that's probably irrelevant.

Ha ha, yes, pretty much! I wouldn't worry too much about the upper bound.

I meant the wiki update/PR processes were wonky.

Ohh, I see! Sorry, I thought you were referring to the API. Once I've finished testing and merged the PR, I'll update the wiki with your example. I'll try and find some time this weekend to do some more testing. I'm actually quite looking forward to seeing where this work goes and how it performs in a real scenario. Have you managed to source the parts needed for your Pi in order to place it outside? I've never managed to find a case which is both waterproof and suitable for outdoor use - I'd be interested to know how you're planning on doing it.

MV10 commented 3 years ago

I've seen some outdoor setups (and even one underwater setup!) using PVC pipe with a glass covering for the lens. I'm thinking about buying one of those extra-long camera ribbon cables so that only the camera is outside. But I don't want to invest too much effort into an outside setup right now, we're planning to move (perhaps 6 months) so I'm not sure what my needs will be just yet. I may try building at least one this way, though. If I do I'll be sure to ping you.

I've also seen people take apart the more realistic fake security cameras and use those.

The other problem I need to address is getting power to an outside setup, which is often rather distant from a convenient receptacle. I probably ought to invest in POE, but more likely I'll run a 12V supply and a 12V-to-5V buck step-down at the Pi. I tried a 5V brick with a long cable (~3m) but there's too much drop when the Pi starts working hard. I had it lock up when streaming MJPEG.

MV10 commented 3 years ago

Regarding the wiki PR problem -- how about just putting the MDs into a docs folder? I was thinking I should do a writeup for the ExternalProcessCaptureHandler, and if you're OK with moving away from the wiki, I'd be happy to open a PR to set that up. Then either the wiki homepage could reference the doc index, or the readme could and remove the wiki.

If you don't want to do that, I'll just open an issue with proposed content for the handler.

techyian commented 3 years ago

I'm not really keen on moving away from the wiki, however what I have done is add you as a collaborator to the repository and that should allow you to make changes to the wiki going forward.

I've had a bit more of a test of the work you've done today, and once the changes we've discussed above have been resolved I'm happy to merge the PR in. Performance has definitely improved considerably and it's fun playing around with the mask image too. Would you like me to commit my local changes to your fork and you can review?

MV10 commented 3 years ago

Yes, please do commit your changes. Thanks!

MV10 commented 3 years ago

I've updated the motion detection example in the wiki. I noticed another problem with those SO replies about how to PR wiki repositories -- not all the files are cloned. For example, my local copy didn't have your "What is MMAL" content, for some reason. Not going to bother troubleshooting it, just pointing it out in case someone else asks about wiki edits.

techyian commented 3 years ago

Thanks Jon. I've just pushed some changes to the PR now. You may want to do another local test yourself and once you're happy I'll merge it in.

MV10 commented 3 years ago

That's fantastic.

8 days ago we were at 97ms per frame, processing only about 5 FPS... now we're at just 6ms and almost 24 FPS!

I see no issues, ship it!

I'll work on documenting the mask option in the wiki.

After I spend some time on the motion/snapshot answer you provided in the issue I opened yesterday, I want to explore a finer-grained motion grid. With this kind of throughput, I think it's reasonable to consider grid-cell proximity and some of the related stuff I was talking about earlier. Also this array work is the definition of CPU-bound processing, I want to try Parallel.For versus multiple Task.Run calls and let the runtime decide how to divide the load. But if anything comes of that I'll do another PR.

This looks very good. Thanks!

MV10 commented 3 years ago

I forgot to check memory usage -- it's way down and rock-steady, too. (The fledgling game developer inside me hates allocations.)

8 days ago we started here: 329MB - 382MB, average about 362MB

Now it jumps straight to 329MB and it doesn't change over the whole run.

By the way, only the deltas in those memory usage numbers are useful. The specific number (300MB+) is the dotnet runtime which, on my Pi, includes an ASP.NET Core website idling in the background. But the allocation fluctuations and GC passes I saw in the earlier tests were all down to motion detection.

Good stuff.

techyian commented 3 years ago

That all sounds excellent. Great work! I'll merge it in now.