techyian / MMALSharp

C# wrapper to Broadcom's MMAL with an API to the Raspberry Pi camera.
MIT License
195 stars 33 forks source link

MMALCamera.ProcessAsync sometimes hangs indefinitely #198

Open Kas-code opened 3 years ago

Kas-code commented 3 years ago

Hello, I'm running the following code on my Raspberry Pi:

        public static async Task<byte[]> CaptureRawData()
        {
            MMALCamera cam = MMALCamera.Instance;
            MMALCameraConfig.StillEncoding = MMALEncoding.BGR24;
            MMALCameraConfig.StillSubFormat = MMALEncoding.BGR24;
            using (var imgCaptureHandler = new MemoryStreamCaptureHandler())
            using (var renderer = new MMALNullSinkComponent())
            {
                cam.ConfigureCameraSettings(imgCaptureHandler);
                cam.Camera.PreviewPort.ConnectTo(renderer);

                // Camera warm up time
                await Task.Delay(2000);
                Serilog.Log.Logger.Debug("cam.ProcessAsync(cam.Camera.StillPort)");
                await cam.ProcessAsync(cam.Camera.StillPort));
                Serilog.Log.Logger.Debug("Got response, returning data");
                return imgCaptureHandler.CurrentStream.ToArray();
            }
        }

This captures an image to memory and returns it as a byte array. Mostly it works great. I'm running this to capture an image every 10 seconds or so. Randomly, after a few hours the program freezes on the line "await cam.ProcessAsync" which I can tell because of the serilog logging. Having left it for several hours, it never recovers, it just hangs forever on that line and I have to Ctrl+C to kill the program. If I run it again it can capture images again without issue, only to hang again after a few hours in the same place. I read that this may be caused due to power issues, but I'm using an official RPi power supply without any other peripherals. Regardless, even if it is caused by intermittent power brown out, it doesn't really matter to me if it fails to capture one image every few hours, I just want to be able to detect the failure and try again. As it stands, I can't retry since the main thread completely locks up. It's an absolute requirement that I capture the image to memory, I can't use the other methods in MMALSharp to capture the image to disk, since I'm doing some post filtering on the image in memory after capturing it, and then deciding that I only want to save certain images to disk if they pass certain tests. I've attempted to detect the hang with a timeout and retry, but this also hangs with some strange behaviour. This is already quite a long post so I'll post as a reply my attempt to work around the issue.

Kas-code commented 3 years ago

I'm using the following method to try to detect the timeout. It runs a Task.Delay as well as cam.ProcessAsync at the same time, and then uses Task.WhenAny to wait for either of them to finish. If the Task.Delay returns first then it knows that cam.ProcessAsync is hanging.

public static async Task<bool> CancelAfterAsync(Task startTask, TimeSpan timeout)
        {
            using (var timeoutCancellation = new CancellationTokenSource())
            {
                var delayTask = Task.Delay(timeout, timeoutCancellation.Token);
                Serilog.Log.Logger.Debug("await Task.WhenAny");
                var completedTask = await Task.WhenAny(startTask, delayTask);
                Serilog.Log.Logger.Debug("Finished await Task.WhenAny");
                // Cancel timeout to stop either task:
                // - Either the original task completed, so we need to cancel the delay task.
                // - Or the timeout expired, so we need to cancel the original task.
                // Canceling will not affect a task, that is already completed.
                timeoutCancellation.Cancel();
                if (completedTask == startTask)
                {
                    // original task completed
                    Serilog.Log.Logger.Debug(" await startTask;");
                    await startTask;
                    return true;
                }
                else
                {
                    Serilog.Log.Logger.Debug("Timed out");
                    // timeout
                    return false;
                }
            }
        }

I then use this in the following way:

public static async Task<byte[]> CaptureRawData()
        {
            MMALCamera cam = MMALCamera.Instance;
            MMALCameraConfig.StillEncoding = MMALEncoding.BGR24;
            MMALCameraConfig.StillSubFormat = MMALEncoding.BGR24;
            using (var imgCaptureHandler = new MemoryStreamCaptureHandler())
            using (var renderer = new MMALNullSinkComponent())
            {
                cam.ConfigureCameraSettings(imgCaptureHandler);
                cam.Camera.PreviewPort.ConnectTo(renderer);

                // Camera warm up time
                await Task.Delay(2000);
                Serilog.Log.Logger.Debug("cam.ProcessAsync(cam.Camera.StillPort)");
                if (await TaskUtils.CancelAfterAsync(cam.ProcessAsync(cam.Camera.StillPort), TimeSpan.FromSeconds(100)))
                {
                    Serilog.Log.Logger.Debug("Got response, returning data");
                    return imgCaptureHandler.CurrentStream.ToArray();
                }
                else
                {
                    Serilog.Log.Logger.Warning("Camera timed out, return null");
                    return null;
                }
            }
        }

        public static async Task<byte[]> CaptureImage()
        {
            byte[] data = null;

            for (var i = 0; i < 10; i++)
            {
                data = await CaptureRawData();
                if (data != null)
                {
                    break;
                }
                else
                {
                    Serilog.Log.Logger.Warning("Camera timed out, retrying");
                }
            }

            if (data == null || data.Length == 0)
            {
                //Todo: Better exception message
                throw new Exception("Image capture failed");
            }

            return data;
        }

If the cam.ProcessAsync method hangs, then the CancelAfterAsync method should detect this and return false. When this returns false, the CaptureRawData method should return null. Then the CaptureImage method should detect a null result and retry up to 10 times. But instead, the logging output I get is the following:

[13:54:54 DBG] Running CaptureRawData [13:54:54 DBG] cam.ProcessAsync(cam.Camera.StillPort) [13:54:54 DBG] await Task.WhenAny [13:56:34 DBG] Finished await Task.WhenAny [13:56:34 DBG] Timed out [13:56:34 WRN] Camera timed out, return null

It then hangs indefinitely on the return null line inside CaptureRawData. This makes no sense. It's clear that CancelAfterAsync has detected the hang and returned false, but then it hangs when returning null from CaptureRawData. It never gets to the line where it logs "Camera timed out, retrying", which is confusing because all it's doing is returning null?

If anyone can just help me with how to detect the hang and retry safely then that would be great.

techyian commented 3 years ago

I'm struggling for time to invest in MMALSharp at the moment due to work commitments. My advice would be to enable verbose logging using MMALCameraConfig.Debug = true and see what is happening before the hanging happens which may help identify where the problem is happening. I suspect it is happening because a buffer header that you've sent to MMAL is not being processed and returned to the callback handler, therefore causing an indefinite "hang". I would observe how much memory is available to the system at the point this problem occurs by running free in a terminal window and see if there has been a memory leak of any kind.

You don't say in your message whether you're using .NET Core or Mono. If you're using .NET Core, please can you try the same application under the latest version of Mono and see if you can reproduce?

I have encountered issues like this in the past, although it is very difficult to understand the root cause and which component is actually at fault; whether it be the MMALSharp library itself, the MMAL library we're consuming, hardware related, or an issue with the platform (.NET Core/Mono).

Kas-code commented 3 years ago

I've enabled MMALCameraConfig.Debug = true but I'm not seeing any extra output, where does the verbose logging go to?

I'm logging memory usage every minute in a python script completely independently to this and it stays a solid 25-30% memory usage over 4-5 hours so there's almost definitely not a memory leak.

I'm using .NET Core. I may try Mono as well.

I've also tried removing all async and changing to use the following:

        private static volatile bool _camProcessing = false;

        private static bool WaitForCam(MMALCamera cam)
        {
            _camProcessing = true;
            Task.Run(() =>
            {
                cam.ProcessAsync(cam.Camera.StillPort).ConfigureAwait(false).GetAwaiter().GetResult();
                _camProcessing = false;
            });
            for (var i = 0; i < 1000; i++)
            {
                Thread.Sleep(100);
                if (!_camProcessing)
                {
                    Serilog.Log.Logger.Debug($"cam processing finished");
                    return true;
                }
            }
            Serilog.Log.Logger.Warning($"Reached timeout, camera might have locked up");
            return false;
    }

It reaches the "Camera might have locked up" line and then hangs outside the Task.Run, which is just bizarre.

Kas-code commented 3 years ago

It's possible that it's locking while disposing either one of these using statements:

using (var imgCaptureHandler = new MemoryStreamCaptureHandler())
using (var renderer = new MMALNullSinkComponent())

I'm going to run some further tests and report back.

techyian commented 3 years ago

I think the issue will be a background thread that is failing to complete due to the buffer sending issue I spoke about previously. The native MMAL library sends buffer headers to the callback handlers (function pointers) which invokes a background thread. If an operation on those background threads fails to complete then the application will hang indefinitely because it's unable to terminate cleanly until all threads have completed. As you're processing raw data, the callback will be sent to here and in turn here. The issue will most probably be happening here.

Regarding the logging, I think the output you see will depend on the Log Level you've set against Serilog and you will want to set a minimum level of "Debug". The output will then go to your desired target i.e. console/file.

It will be interesting to find out whether it's the sending or the releasing of the buffer that's causing the lock but I would encourage you to try Mono to see if the issue is .NET Core specific.

Kas-code commented 3 years ago

Ok, so the locking up was also happening during disposing one of those two usings in my previous comment. After cam.ProcessAsync(cam.Camera.StillPort) hangs, if I then dispose one of those two usings, that will also hang. I can detect that cam.ProcessAsync(cam.Camera.StillPort) is hanging by wrapping it in a Task.Run, so I tried the following: Only dispose when there hasn't been any hang detected. If there has been a hang detected, then don't dispose but just try again regardless. I can get to the point of calling var renderer = new MMALNullSinkComponent() again, but that gets the following error:

[08:01:14 DBG] var renderer = new MMALNullSinkComponent(); mmal: mmal_vc_component_create: failed to create component 'vc.null_sink' (1:ENOMEM) mmal: mmal_component_create_core: could not create component 'vc.null_sink' (1) [08:01:14 FTL] Fatal error MMALSharp.MMALNoMemoryException: Out of memory. Unable to create component at MMALSharp.MMALNativeExceptionHelper.MMALCheck(MMAL_STATUS_T status, String message) at MMALSharp.MMALComponentBase.CreateComponent(String name) at MMALSharp.MMALComponentBase..ctor(String name) at MMALSharp.Components.MMALDownstreamComponent..ctor(String name) at MMALSharp.Components.MMALRendererBase..ctor(String name) at MMALSharp.Components.MMALNullSinkComponent..ctor()

Although it's definitely not out of memory, the memory usage is only 30% when this happens. If I can detect the hang, can I just reset everything and start again by using MMALCamera.Instance.Cleanup or ForceStop or StopCapture? If I Ctrl+C the process and then run it again everything is fine for another few hours, so if there's a way of just resetting everything and starting fresh from within the code, then that would be great.

Kas-code commented 3 years ago

I've found an acceptable workaround that is a good enough solution for me for now. I can detect the first hang and just exit the program. I've configured it as a linux service that will automatically re-run if it exits. Since the program is running a continual loop anyway, it's not really detrimental if it exits and starts running again, and it only does so a couple of times per day. If I get some more time in the future I can try it in mono and do some further investigation, but for now, the solution means it does what I need it to do.