BGR performance - Githubissues

techyian / MMALSharp

C# wrapper to Broadcom's MMAL with an API to the Raspberry Pi camera.

MIT License

195 stars 33 forks source link

BGR performance #146

Closed CobraCalle closed 3 years ago

CobraCalle commented 4 years ago

Hello,

I`m trying to get full frames from the camera als BGR24, because I would like to perform some kind of image processing on the video input (for example face detection) with opencv.

My first try was to use an USB-cam with opencv (v4l2) but never reached the required framerate-resolution-cpu-load parameters I need to achieve. So I decided to switch to a pi camera to get rid of the usb overhead and make use of the gpu for decoding etc.

But when I change the "rapid image capturing" sample to BMP + BGR24, I get only 3 images per second... can anyone give me a hint whats the most efficient way to get full frames as bitmap? I already tried to play around with ispComponent and videoencoder etc. but no way to achive framerates > 3 frames per second....

When I switch to MJPEG I get 25 frames / second, but then I have to decode those jpgs with opencv on the cpu and that produces very high cpu load.

Thank you very much

Carl

techyian commented 4 years ago

Hi,

Using BMP is going to produce relatively large amounts of data compared to JPEG/MJPEG. Using BGR24 will add a slight overhead too as the camera natively works in YUV420 but I don't think that's the main issue here. What resolution are you asking the camera to output? The lower the better in this scenario.

The Image Encoder will by default only use 1 buffer, and depending on the configuration asked of it will have quite a small buffer size too - in my local testing this is around 80kb per buffer, this means there's going to be a lot of passing between components before it's filled with the required amount of data for a full frame. In addition, the Raspberry Pi's IO bandwidth is very limited so that's going to factor into this too.

You could try overriding the number of buffers and their size that's assigned by default using a setup such as:

var encInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420, zeroCopy: true);
var encOutputConfig = new MMALPortConfig(MMALEncoding.BMP, MMALEncoding.BGR24, bufferNum: 3, bufferSize: 300000, zeroCopy: true);

imgEncoder
        .ConfigureInputPort(encInputConfig, null)
        .ConfigureOutputPort(encOutputConfig, imgCaptureHandler);

I'm using zeroCopy because we're passing around larger buffer sizes so that helps mitigate the copy between the GPU & ARM. You may be able to tweak those values slightly to give a little extra perf. There was an issue around the committing of custom buffer sizes which was fixed in #141 - you'll need to either grab the latest code from source or use the MyGet pre-release binaries.

When I switch to MJPEG I get 25 frames / second, but then I have to decode those jpgs with opencv on the cpu and that produces very high cpu load.

Could you not output the raw BGR frames from the camera directly? Capture handlers which implement IOutputCaptureHandler are provided an ImageContext object which contains an Eos property; this property will be set to true when you receive a full frame - my thoughts here are that you could create your own capture handler and handle the frames on the fly.

I have just made a change in #147 by removing a restriction on the FastImageOutputCallbackHandler class which may also be of use to you. What you can now do is split frames produced by other components such as the splitter into multiple files, meaning you could do something like:

public async Task TakePictureFromVideoPort()
{
    MMALCameraConfig.Resolution = Resolution.As03MPixel;
    MMALCameraConfig.EncodingSubFormat = MMALEncoding.BGR24;

    using (var imgCaptureHandler = new ImageStreamCaptureHandler("/home/pi/images/videoport", "raw"))
    using (var splitter = new MMALSplitterComponent())
    using (var nullSink = new MMALNullSinkComponent())
    {
        cam.ConfigureCameraSettings();

        var splitterInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.BGR24);
        var splitterOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24);

        // Create our component pipeline.        
        splitter
            .ConfigureInputPort(splitterInputConfig, cam.Camera.VideoPort, null)
            .ConfigureOutputPort<FastStillPort>(0, splitterOutputConfig, imgCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(splitter);
        cam.Camera.PreviewPort.ConnectTo(nullSink);

        // Camera warm up time
        await Task.Delay(2000);

        CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));

        await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
    }
}

techyian commented 4 years ago

If you don't need to store the image data then I'd look into creating your own capture handler and processing the raw data as it comes in. That way you're not restricted on the disk IO and if you can dump the data into Emgu/OpenCV then you should hopefully be able to achieve the performance you require.

Hopefully that helps you a bit.

CobraCalle commented 4 years ago

Thank you very much for your suggestions... I'll try that this evening. But perhaps there is a more optimal scenario. My goal is to create a presence detector based on a 360 degree fisheye camera, sitting on the ceiling. This camera performs a motion detection in the first step. And if motion is detected the area with the motion is cut out, it's perspective is corrected (fisheye) and than a face detection ist performed. To get the face detection to work with a fisheye image the resolution has to be as high as possible. But to catch faces in motion (from the ceiling point of view) we have to have as much frames as possible.

Idea: would it be possible to use a videorenderer for that... At the end I don't really need full frames... One frame that is constantly updated with patches (h264) would work of course too... 😁

And another question... Does the use of a preview has a high performance impact? If not, perhaps it would make sense to perform the motion detection on the preview and only use the real stream when motion is detected (I know you have implemented a motion detection, but at this s Point I'm not 100% sure that it matches all my needs (for example in terms of background adaptation over a whole day etc.)

CobraCalle commented 4 years ago

Hi,

Using BMP is going to produce relatively large amounts of data compared to JPEG/MJPEG. Using BGR24 will add a slight overhead too as the camera natively works in YUV420 but I don't think that's the main issue here. What resolution are you asking the camera to output? The lower the better in this scenario.

The Image Encoder will by default only use 1 buffer, and depending on the configuration asked of it will have quite a small buffer size too - in my local testing this is around 80kb per buffer, this means there's going to be a lot of passing between components before it's filled with the required amount of data for a full frame. In addition, the Raspberry Pi's IO bandwidth is very limited so that's going to factor into this too.

You could try overriding the number of buffers and their size that's assigned by default using a setup such as:
var encInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420, zeroCopy: true);
var encOutputConfig = new MMALPortConfig(MMALEncoding.BMP, MMALEncoding.BGR24, bufferNum: 3, bufferSize: 300000, zeroCopy: true);

imgEncoder
        .ConfigureInputPort(encInputConfig, null)
        .ConfigureOutputPort(encOutputConfig, imgCaptureHandler);
I'm using zeroCopy because we're passing around larger buffer sizes so that helps mitigate the copy between the GPU & ARM. You may be able to tweak those values slightly to give a little extra perf. There was an issue around the committing of custom buffer sizes which was fixed in #141 - you'll need to either grab the latest code from source or use the MyGet pre-release binaries.

When I switch to MJPEG I get 25 frames / second, but then I have to decode those jpgs with opencv on the cpu and that produces very high cpu load.

Could you not output the raw BGR frames from the camera directly? Capture handlers which implement IOutputCaptureHandler are provided an ImageContext object which contains an Eos property; this property will be set to true when you receive a full frame - my thoughts here are that you could create your own capture handler and handle the frames on the fly.

I have just made a change in #147 by removing a restriction on the FastImageOutputCallbackHandler class which may also be of use to you. What you can now do is split frames produced by other components such as the splitter into multiple files, meaning you could do something like:
public async Task TakePictureFromVideoPort()
{
    MMALCameraConfig.Resolution = Resolution.As03MPixel;
    MMALCameraConfig.EncodingSubFormat = MMALEncoding.BGR24;

    using (var imgCaptureHandler = new ImageStreamCaptureHandler("/home/pi/images/videoport", "raw"))
    using (var splitter = new MMALSplitterComponent())
    using (var nullSink = new MMALNullSinkComponent())
    {
        cam.ConfigureCameraSettings();

        var splitterInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.BGR24);
        var splitterOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24);

        // Create our component pipeline.        
        splitter
            .ConfigureInputPort(splitterInputConfig, cam.Camera.VideoPort, null)
            .ConfigureOutputPort<FastStillPort>(0, splitterOutputConfig, imgCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(splitter);
        cam.Camera.PreviewPort.ConnectTo(nullSink);

        // Camera warm up time
        await Task.Delay(2000);

        CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));

        await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
    }
}

when I try this, I get the following error:

Unhandled exception. System.ArgumentException: Working port component is not of type IImageEncoder at MMALSharp.Callbacks.FastImageOutputCallbackHandler.Callback(IBuffer buffer) at MMALSharp.Ports.Outputs.FastStillPort.NativeOutputPortCallback(MMAL_PORT_T port, MMAL_BUFFER_HEADER_T buffer) Aborted

...I had to Change the following Lines of Code, because the properties you use here do not exist (using the latest version of nuget)

        MMALCameraConfig.VideoResolution = Resolution.As03MPixel;
        MMALCameraConfig.VideoEncoding = MMALEncoding.BGR24;

Do I have to use the current version directly from github?

techyian commented 4 years ago

I mentioned in my previous message you will need to either grab the source from GitHub or download the pre-release packages from MyGet.

techyian commented 4 years ago

I will have a look into whether the preview port is a feasible approach here. The preview port is primarily used for calculating exposure, but can also be used for rendering the video onto the HDMI display. I've not looked into processing the frames directly on this port and I don't think there's any way currently built into the library which would allow this.

Regarding the resolution of your image frames, depending on the resolution and framerate you request from the camera it will either use the full size of the sensor or partial Field of View. Please look at the Wiki for more info on the supported modes for your respective camera module: OV5647 or IMX219. The camera will use a sensor mode which best suits the resolution requested, but it can also be overridden using MMALCameraConfig.SensorMode.

CobraCalle commented 4 years ago

Thank your very much... that worked fine... I could achieve 10-11 frame per second at 1600x1200 and had a cpu utilization at around 60% on one core... I`m not sure if this is too high for 24x7 but we will see. Perhaps the use of the ROI-Feature could improve this a bit (because the fish eye lens produces a circle image in the center of the sensor, so I can clip the rest).

another question... at the moment I have to copy the data for every frame form the dotnet-byte-array to the native opencv-mat. would it be possible to hand over the pointer to the native buffer of the opencv mat to your api? so that mmal fills directy the opencv buffer. On this resolution every frame is round about 5mb and if we do do not have to copy that, I think that would save a lot of cpu resources...

..alternatively exposing the pointer to the buffer (and prhaps disable copying the data to the dotnet-array) could work too... because it is possible to create an opencv mat and hand over the pointer to the data

CobraCalle commented 4 years ago

Another question... I`m using the IMX477 sensor (HQ camera) and as soon as I change the SensorMode or the Resolution (to one of the supported nativ resolutions metioned on the wiki) I get the following error:

mmal: mmal_vc_port_info_set: failed to set port info (3:0): EINVAL mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0xfbe640 (EINVAL) Unhandled exception. MMALSharp.MMALInvalidException: Argument is invalid. vc.ril.video_splitter:out:0(BGR3): Unable to commit port changes. at MMALSharp.MMALNativeExceptionHelper.MMALCheck(MMAL_STATUS_T status, String message) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Exceptions.cs:line 33 at MMALSharp.Ports.PortBase`1.Commit() in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Ports\PortBase.cs:line 320 at MMALSharp.Ports.Outputs.OutputPort.Configure(IMMALPortConfig config, IInputPort copyFrom, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Ports\Outputs\OutputPort.cs:line 142 at MMALSharp.Ports.Outputs.FastStillPort.Configure(IMMALPortConfig config, IInputPort copyFrom, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Ports\Outputs\FastStillPort.cs:line 65 at MMALSharp.Components.MMALDownstreamComponent.ConfigureOutputPort(Int32 outputPort, IMMALPortConfig config, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Components\MMALDownstreamComponent.cs:line 117 at MMALSharp.Components.MMALDownstreamComponent.ConfigureOutputPort[TPort](Int32 outputPort, IMMALPortConfig config, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Components\MMALDownstreamComponent.cs:line 135

As describend in the wiki, mode 1 should be 2028x1520, but there is no Resolution-instance for this values... so I created a new instance and set width and height to 2028 and 1520, but that results in the error described above. The same happens when I change the sensor mode to 1

techyian commented 4 years ago

would it be possible to hand over the pointer to the native buffer of the opencv mat to your api? so that mmal fills directy the opencv buffer.

I would say a cautious yes but I'm not entirely sure on the inner workings of opencv to say whether this is safe or not. This would require implementing your own Callback Handler and handling the buffer data yourself. If you research the GetBufferData method, this is called in the PortCallbackHandler class here. To achieve your goal your custom Callback Handler would want to bypass this and be handled manually. To give a brief overview on what this would look like:

using static MMALSharp.MMALNativeExceptionHelper; // This is required for the MMALCheck method call.

public class MyCustomCallbackHandler : PortCallbackHandler<IOutputPort, IOutputCaptureHandler>
{
    public MyCustomCallbackHandler(IOutputPort port, IOutputCaptureHandler handler) : base(port, handler)
    {
    }

    public override unsafe void Callback(IBuffer buffer)
    {
        // Do not call the base method.

        try
        {
            // Lock the memory via MMAL.
            MMALCheck(MMALBuffer.mmal_buffer_header_mem_lock(buffer.Ptr), "Unable to lock buffer header.");

            // Pass the pointer and its length to opencv.
            var dataPointer = buffer.Ptr->data + buffer.Offset;
            var dataLength = buffer.Length;

            // Once you have finished with the memory, you must unlock it.
            MMALBuffer.mmal_buffer_header_mem_unlock(buffer.Ptr);
        }
        catch
        {
            // If something goes wrong, unlock the header.
            MMALBuffer.mmal_buffer_header_mem_unlock(buffer.Ptr);
            MMALLog.Logger.LogWarning("Unable to handle data. Returning null.");
        }
    }
}

You will need to register your Callback Handler with the output port which is done by calling RegisterCallbackHandler against the IOutputPort instance. You might find my guide on handling data in the Wiki useful. Once you've handled the data in opencv, you'll then need to decide whether you want to call the Capture Handler instance within the Callback Handler.

I would like to stress that handling the buffer data manually unless done correctly will likely cause your Pi to freeze and require rebooting, so it's essential you lock and unlock the memory accordingly.

Another question... I`m using the IMX477 sensor (HQ camera) and as soon as I change the SensorMode or the Resolution (to one of the supported nativ resolutions metioned on the wiki) I get the following error:

I'm sorry, I don't have access to the new IMX477 camera module and don't have any current plans to purchase one due to the costs involved. I'm surprised that you're having issues with it though as it should just work in the same way as the other camera modules. Your stacktrace shows that the port that's failing to commit changes is the splitter's output port, can you post a snippet of your code which is doing the camera configuration please?

Please let me know if you need any further guidance on the native buffer handling.

CobraCalle commented 4 years ago

That sound very good... but I`m a little bit confuse how the pipe has to be setup...

This is my actual code: static async Task Main(string[] args) { MMALCamera cam = MMALCamera.Instance;

        MMALCameraConfig.Resolution = Resolution.As2MPixel;
        MMALCameraConfig.EncodingSubFormat = MMALEncoding.BGR24;
        MMALCameraConfig.VideoStabilisation = false;

        //var cropRect = new System.Drawing.Rectangle(177, 64, 1174, 1136);

        using (var imgCaptureHandler = new OpenCvMatCaptureHandler(MMALCameraConfig.Resolution))
        //using (var imgCaptureHandler = new ImageStreamCaptureHandler("/home/pi/images", "raw"))
        using (var splitter = new MMALSplitterComponent())
        using (var nullSink = new MMALNullSinkComponent())
        {
            cam.ConfigureCameraSettings();

            var splitterInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.BGR24);
            var splitterOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24);

            // Create our component pipeline.        
            splitter
                .ConfigureInputPort(splitterInputConfig, cam.Camera.VideoPort, null)
                .ConfigureOutputPort<FastStillPort>(0, splitterOutputConfig, imgCaptureHandler);

            splitter.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(cam.Camera.VideoPort, imgCaptureHandler));
            //cam.Camera.VideoPort.RegisterCallbackHandler(new OpenCvMatCallbackHandler(splitter.Outputs[0], imgCaptureHandler));

            cam.Camera.VideoPort.ConnectTo(splitter);
            cam.Camera.PreviewPort.ConnectTo(nullSink);

            // Camera warm up time
            await Task.Delay(2000);

            CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));

            await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
        }
    }

    public class OpenCvMatCallbackHandler : PortCallbackHandler<IOutputPort, IOutputCaptureHandler>
    {
        public OpenCvMatCallbackHandler(IOutputPort port, IOutputCaptureHandler handler) : base(port, handler)
        {
        }

        public override unsafe void Callback(IBuffer buffer)
        {
            Console.WriteLine("X");

            // Do not call the base method.

            try
            {
                // Lock the memory via MMAL.
                MMALCheck(MMALBuffer.mmal_buffer_header_mem_lock(buffer.Ptr), "Unable to lock buffer header.");

                // Pass the pointer and its length to opencv.
                var dataPointer = buffer.Ptr->data + buffer.Offset;
                var dataLength = buffer.Length;

                this.Mat = new Mat(base.WorkingPort.Resolution.Height, base.WorkingPort.Resolution.Width, MatType.CV_8UC3, (IntPtr)dataPointer);

                // Once you have finished with the memory, you must unlock it.
                MMALBuffer.mmal_buffer_header_mem_unlock(buffer.Ptr);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                // If something goes wrong, unlock the header.
                MMALBuffer.mmal_buffer_header_mem_unlock(buffer.Ptr);
                //MMALLog.Logger.LogWarning("Unable to handle data. Returning null.");
            }
        }

        public Mat Mat
        {
            get;
            private set;
        }
    }

As you can see, I output "x" on the Callback... I thought this is calle one time and then that buffer is reused for every frame, but the "x" is printed out many time... and the Process-Method of my CaptureHandler is never called. Does that mean, I do not need the CaptureHandler anymore and the callback-method is called for every frame? but if I do not need the capture handler, why do a have to pass one to the callback constructor (and the splitter)?

techyian commented 4 years ago

First of all, when registering your callback handler pass in the port you're registering it against, i.e. splitter.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(splitter.Outputs[0], imgCaptureHandler));.

As you can see, I output "x" on the Callback... I thought this is calle one time and then that buffer is reused for every frame, but the "x" is printed out many time...

The Callback method will be called each time a new buffer is returned to a port with fresh data. I would expect that you'll need to create a new OpenCv Mat object each time this method is called.

and the Process-Method of my CaptureHandler is never called.

Sorry I should have been clearer - you will need to call the Process method of the Capture Handler within that method. There is a property called CaptureHandler which you will have access to.

techyian commented 4 years ago

Please look at the Callback method against the PortCallbackHandler class as it will give you an idea as to what your overridden method should look like.

CobraCalle commented 4 years ago

Now we are talking baby! 18 FPS on 1600x1200 BGR, ready to process with opencv... yeah! :-)

But there is (as usual) one more thing in my head... when using an usb-cam I uses one thread to grab frames, and one thread to process frame... the solution we have achieved now forces me to adhoc process the frame and that will lower the fps... would it be possible to hand over mmal the buffer to use for the next frame? than I could create a pool of opencv mats and fill those mats while processing the grabbed frames on another thread... :-)

CobraCalle commented 4 years ago

sorry to bother you again... but would it be possible to duplicate the splitter in my code and connect that second splitter to the previewport and use that (probably higher fps and lower resolution) for motinos detection in parallel? and if yes... is it possible to start and stop the processing of the VideoPort?

CobraCalle commented 4 years ago

btw. the current solution provides not only 18 FPS to opencv... it consumes near zeor cpu... yeah!!!!

CobraCalle commented 4 years ago

sorry to bother you again... but would it be possible to duplicate the splitter in my code and connect that second splitter to the previewport and use that (probably higher fps and lower resolution) for motinos detection in parallel? and if yes... is it possible to start and stop the processing of the VideoPort?

OK... I think I`ve misunderstood the preview-port... connected the splitter to the preview-port instead of the video-port and got still 18FPS with 1600x1200... I thought the preview would be "smaller". But that not true - right?

techyian commented 4 years ago

Now we are talking baby! 18 FPS on 1600x1200 BGR, ready to process with opencv... yeah! :-)

That's great news you've made progress and getting framerates you're happy with!

But there is (as usual) one more thing in my head... when using an usb-cam I uses one thread to grab frames, and one thread to process frame... the solution we have achieved now forces me to adhoc process the frame and that will lower the fps... would it be possible to hand over mmal the buffer to use for the next frame? than I could create a pool of opencv mats and fill those mats while processing the grabbed frames on another thread... :-)

I'm not sure I fully understand what you're suggesting here. The same native buffer(s) is "reused" and passed between components but when they are provided to MMALSharp a new managed instance of MMALBufferImpl is created. I think the issue here is the locking/unlocking of the native buffer memory as this needs to be coordinated correctly or you'll cause your Pi to lock up. This makes it more difficult to schedule the processing to a different thread/task as your buffer can't be passed back to a port while you have a lock on it. If I've misunderstood, please could you ellaborate on what you're suggesting please?

sorry to bother you again... but would it be possible to duplicate the splitter in my code and connect that second splitter to the previewport and use that (probably higher fps and lower resolution) for motinos detection in parallel? and if yes... is it possible to start and stop the processing of the VideoPort?

The Still, Video and Preview ports can in theory have their own resolution, framerates and encoding types applied but the library does not allow this and they all share common values. The main reason behind this is usability as I want the library to be as easy to use as possible for the majority of users. Up until recently, the still port's configuration differed from the video and preview ports but caused confusion so I have decided to merge them all together. I will review whether it is a good idea to allow these settings to be overridden but again, I don't want to cause confusion for the majority of users of the library.

CobraCalle commented 4 years ago

The Still, Video and Preview ports can in theory have their own resolution, framerates and encoding types applied but the library does not allow this and they all share common values. The main reason behind this is usability as I want the library to be as easy to use as possible for the majority of users. Up until recently, the still port's configuration differed from the video and preview ports but caused confusion so I have decided to merge them all together. I will review whether it is a good idea to allow these settings to be overridden but again, I don't want to cause confusion for the majority of users of the library.

OK... I think it would be better to use the previewport with a different resolution / framerate, but in theory I think to should be possible a achieve what Im looking for (a parallel smaller video stream to prevent resizing in opencv / appling motion detection on the original sized frame) with a resizer (if the size is performd on the gpu). But Im not able to get this to work.

Pleas have a look at this: using (var splitter = new MMALSplitterComponent()) using (var resizer = new MMALResizerComponent()) using (var nullSink = new MMALNullSinkComponent()) { cam.ConfigureCameraSettings();

            var splitterInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.BGR24);
            var splitterOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24);

            var resizerOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24, width: 1024, height: 768, zeroCopy:true);

            // Create our component pipeline.        
            splitter.ConfigureInputPort(splitterInputConfig, cam.Camera.VideoPort, null);

            splitter.ConfigureOutputPort<FastStillPort>(0, splitterOutputConfig, null);
            splitter.ConfigureOutputPort<FastStillPort>(1, splitterOutputConfig, null);
            //splitter.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(splitter.Outputs[0], null));

            resizer.ConfigureOutputPort(0, resizerOutputConfig, null);
            //resizer.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(resizer.Outputs[0], null));

            splitter.Outputs[1].ConnectTo(resizer);

            cam.Camera.VideoPort.ConnectTo(splitter);
            cam.Camera.PreviewPort.ConnectTo(nullSink);

            // Camera warm up time
            await Task.Delay(2000);

            CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));

            await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
        }

why am I gettings this error: mmal: mmal_vc_port_info_set: failed to set port info (3:0): EINVAL mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0x19de240 (EINVAL) mmal: mmal_vc_port_info_set: failed to set port info (3:0): EINVAL mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0x19de240 (EINVAL) Unhandled exception. MMALSharp.MMALInvalidException: Argument is invalid. vc.ril.resize:out:0(BGR3): Unable to commit port changes. at MMALSharp.MMALNativeExceptionHelper.MMALCheck(MMAL_STATUS_T status, String message) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Exceptions.cs:line 33 at MMALSharp.Ports.PortBase`1.Commit() in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Ports\PortBase.cs:line 320 at MMALSharp.Ports.Outputs.OutputPort.Configure(IMMALPortConfig config, IInputPort copyFrom, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Ports\Outputs\OutputPort.cs:line 85 at MMALSharp.Ports.Outputs.StillPort.Configure(IMMALPortConfig config, IInputPort copyFrom, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Ports\Outputs\StillPort.cs:line 64 at MMALSharp.Components.MMALDownstreamComponent.ConfigureOutputPort(Int32 outputPort, IMMALPortConfig config, IOutputCaptureHandler handler) in S:\Oxidium\Oxidium.BoardComputer\github\MMALSharp-dev\src\MMALSharp\Components\MMALDownstreamComponent.cs:line 117

CobraCalle commented 4 years ago

I'm not sure I fully understand what you're suggesting here. The same native buffer(s) is "reused" and passed between components but when they are provided to MMALSharp a new managed instance of MMALBufferImpl is created. I think the issue here is the locking/unlocking of the native buffer memory as this needs to be coordinated correctly or you'll cause your Pi to lock up. This makes it more difficult to schedule the processing to a different thread/task as your buffer can't be passed back to a port while you have a lock on it. If I've misunderstood, please could you ellaborate on what you're suggesting please?

I would like to achieve the following:

Prepare 1-n buffers
Before a Frame is grabbed MMAL "asks" which buffer to use. Then I can hand over a currently not used buffer
When the frame is processed I mark the buffer as "filled"
A parallel thread is working on the "filled" buffers. When done, the buffer is marked as "free" and can be used in step 2

This would allow me create a pool of buffers, fill them with frames from mmal and process the filled buffers on another trhead.

CobraCalle commented 4 years ago

small update... I´ve changed the CallBackHandler-Code to reuse the opencv mats and changed the buffernum to 2... now I´m able to achive 28 FPS on 1600x1200 in BGR

                // Lock the memory via MMAL.
                MMALCheck(MMALBuffer.mmal_buffer_header_mem_lock(buffer.Ptr), "Unable to lock buffer header.");

                // Pass the pointer and its length to opencv.
                var dataPointer = new IntPtr(buffer.Ptr->data + buffer.Offset);
                var dataLength = buffer.Length;

                if (this.bufferSize == dataLength)
                {
                    Mat mat;

                    if (this.mats.TryGetValue(dataPointer, out mat) == false)
                    {
                        mat = new Mat(base.WorkingPort.Resolution.Height, base.WorkingPort.Resolution.Width, MatType.CV_8UC3, dataPointer);

                        this.mats.Add(dataPointer, mat);
                    }

                    //mat.SaveImage("/home/pi/images/" + Guid.NewGuid().ToString() + ".jpg");

                    Console.WriteLine("New Frame (" + dataPointer.ToString() + ")");
                }

can you explain to me what is hoind on in the background here?

What does mmal_buffer_header_mem_lock? Does that tells mmal the this buffer is "currently not avaiable"?

Is Callback called serial or parallel when BufferNum > 1? Or with other words... when I call mmal_buffer_header_mem_lock for buffer 1 ist "Callback" called for the other buffers parallel?

CobraCalle commented 4 years ago

Idea (BufferNum > 1):

Call mmal_buffer_header_mem_lock
Put the buffer in a queue
Leave "Callbakc" without calling mmal_buffer_header_mem_unlock

On a prallel running thread:

Get a buffer from the queue
Process the image
Call mmal_buffer_header_mem_unlock

Assumtions:

Because we do not call mmal_buffer_header_mem_unlock in "Callback" the buffer is not used for any further frames... but grabbing frames to other "free" buffers continous
When calling mmal_buffer_header_mem_unlock from the "processing thread" the buffer becomes available to store other frames

Would that work?

CobraCalle commented 4 years ago

I've found a way to establish a second, smaller stream for motion detection by using an ISPComponent...

        using (var splitter = new MMALSplitterComponent())
        using (var resizer = new MMALIspComponent())
        using (var nullSink = new MMALNullSinkComponent())
        {
            cam.ConfigureCameraSettings();

            var splitterInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.BGR24);
            var splitterOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24, bufferNum: 2);
            var resizerOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24, width:400, height:300, bufferNum: 2);

            // Create our component pipeline.        
            splitter.ConfigureInputPort(splitterInputConfig, cam.Camera.VideoPort, null);

            splitter.ConfigureOutputPort<FastStillPort>(0, splitterOutputConfig, null);
            splitter.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(splitter.Outputs[0], null));

            resizer.ConfigureOutputPort<FastStillPort>(0, resizerOutputConfig, null);
            resizer.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(resizer.Outputs[0], null));

            cam.Camera.VideoPort.ConnectTo(splitter);
            cam.Camera.PreviewPort.ConnectTo(resizer);

            //splitter.Outputs[1].ConnectTo(resizer);

            // Camera warm up time
            await Task.Delay(2000);

            CancellationTokenSource cts = new CancellationTokenSource();

            var processingTask = cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);

            Console.ReadLine();

            cts.Cancel();

            processingTask.Wait();
        }

Question 1: Does it make any difference if I connect the isp to the previewport or to a second outputport of the splitter?

Question 2: Is it possible to start and stop the "large" stream... so that is only processed when motion is detected

Question 3: Is it correct that there is no way to configure mmal to output a smaller stream on the previewport instead of resizing it with the isp?

techyian commented 4 years ago

Question 1: Does it make any difference if I connect the isp to the previewport or to a second outputport of the splitter?

No, I don't think you're going to notice any difference here due to the Still, Video and Preview ports sharing the same values as we discussed earlier.

Question 2: Is it possible to start and stop the "large" stream... so that is only processed when motion is detected

I think your end goal is closely matched to the behaviour I've implemented for the motion detection built into the library. I know you don't want to use that functionality, but you may be able to take inspiration from what I've done with regards to starting another capture handler when motion is detected.

What you could do is use a combination of an Action delegate accepted as a constructor parameter on your OpenCvMatCallbackHandler and also a CircularBufferCaptureHandler which will be attached to the larger stream. The CircularBufferCaptureHandler has a StartRecording and StopRecording method which you can call when you want the stream to be recorded. Below you can see how this might work - I've not tested this code at all, it's just to give you an idea of what you could do.

public class OpenCvMatCallbackHandler : PortCallbackHandler<IOutputPort, IOutputCaptureHandler>
{
    private Action _startRecordCallback;

    public OpenCvMatCallbackHandler(IOutputPort port, IOutputCaptureHandler handler, Action startRecordCallback) 
        : base(port, handler)
    {
        _startRecordCallback = startRecordCallback;
    }

    public override unsafe void Callback(IBuffer buffer)
    {
        Console.WriteLine("X");

        // Do not call the base method.

        try
        {
            // Lock the memory via MMAL.
            MMALCheck(MMALBuffer.mmal_buffer_header_mem_lock(buffer.Ptr), "Unable to lock buffer header.");

            // Pass the pointer and its length to opencv.
            var dataPointer = buffer.Ptr->data + buffer.Offset;
            var dataLength = buffer.Length;

            this.Mat = new Mat(base.WorkingPort.Resolution.Height, base.WorkingPort.Resolution.Width, MatType.CV_8UC3, (IntPtr)dataPointer);

            // Is motion detected? If so call the Action delegate.
            // Note: You won't want to call this every time, so maybe add a flag in here to indicate that
            // you are currently recording? 
            _startRecordCallback();

            // Once you have finished with the memory, you must unlock it.
            MMALBuffer.mmal_buffer_header_mem_unlock(buffer.Ptr);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
            // If something goes wrong, unlock the header.
            MMALBuffer.mmal_buffer_header_mem_unlock(buffer.Ptr);
            //MMALLog.Logger.LogWarning("Unable to handle data. Returning null.");
        }
    }

    public Mat Mat
    {
        get;
        private set;
    }
}

public async Task MotionDetection()
{
    // When using H.264 encoding we require key frames to be generated for the Circular buffer capture handler.
    MMALCameraConfig.InlineHeaders = true;

    using (var splitter = new MMALSplitterComponent())
    using (var resizer = new MMALIspComponent())
    using (var resizer2 = new MMALIspComponent())
    using (var nullSink = new MMALNullSinkComponent())
    using (var vidCaptureHandler = new CircularBufferCaptureHandler(4000000, "/home/pi/videos/detections", "h264"))
    using (var vidEncoder = new MMALVideoEncoder())
    {
        cam.ConfigureCameraSettings();

        var splitterInputConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.BGR24);
        var splitterOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24, bufferNum: 2);
        var resizerOutputConfig = new MMALPortConfig(MMALEncoding.BGR24, MMALEncoding.BGR24, width:400, height:300, bufferNum: 2);
        var resizerOutputConfig2 = new MMALPortConfig(MMALEncoding.I420, MMALEncoding.I420, width: 1600, height: 1200);
        var vidEncoderOutputConfig = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, bitrate: 1300000);

        // Create our component pipeline.        
        splitter.ConfigureInputPort(splitterInputConfig, cam.Camera.VideoPort, null);
        splitter.ConfigureOutputPort<FastStillPort>(0, splitterOutputConfig, null);

        resizer.ConfigureOutputPort<FastStillPort>(0, resizerOutputConfig, null);
        resizer.Outputs[0].RegisterCallbackHandler(new OpenCvMatCallbackHandler(resizer.Outputs[0], null, () => {
            // Start a new task so we don't block the calling thread.
            Task.Run(async () => {
                // Start recording our H.264 video.
                vidCaptureHandler.StartRecording();

                // (Optionally) Request a key frame to be immediately generated by the video encoder.
                vidEncoder.RequestIFrame();

                // Record for 10 seconds.
                await Task.Delay(10000);

                vidCaptureHandler.StopRecording();
            });
        }));

        // Second ISP component is used for format conversion between BGR24 and I420 for use with the Video Encoder.
        // We are telling it to keep the same 1600 x 1200 resolution.
        resizer2.ConfigureOutputPort(resizerOutputConfig2, null);

        vidEncoder.ConfigureOutputPort(vidEncoderOutputConfig, vidCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(splitter);
        cam.Camera.PreviewPort.ConnectTo(nullSink);

        splitter.Outputs[0].ConnectTo(resizer2);
        splitter.Outputs[1].ConnectTo(resizer);

        resizer2.Outputs[0].ConnectTo(vidEncoder);

        // Camera warm up time
        await Task.Delay(2000);

        CancellationTokenSource cts = new CancellationTokenSource();

        var processingTask = cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);

        Console.ReadLine();

        cts.Cancel();

        await processingTask;
    }

}

I've corrected a few bits in the code above where I felt appropriate. The capture handler will begin recording H.264 from the splitter's output port 0 when the StartRecording method is called. You will also need to add some code to signal to your OpenCvMatCallbackHandler that you're not currently recording, possibly a public boolean property or setter method if you want to keep it private? I just want to clarify that if you were passing a proper .NET byte array as is usually expected then you could re-use a lot of the functionality already in place that I wrote for motion detection in the library, obviously changing the actual detecting to OpenCV instead.

Question 3: Is it correct that there is no way to configure mmal to output a smaller stream on the previewport instead of resizing it with the isp?

This ties in with my response to Question 1. If you are using the ISP component then you are getting hardware accelerated resizing so any difference in performance that you'd get by receiving a smaller resolution directly from the camera itself should hopefully be minimal.

As an aside, I don't think your suggestions on the buffers is going to work. I would not recommend sending a buffer back to a port while you still have a lock on it as bad things are likely to happen - at best you'll get a segmentation fault but you'll probably end up locking the Pi up and need to reboot.

techyian commented 4 years ago

Just made an edit to the code. I have added a second ISP component which is going to be used for format conversion between BGR24 and I420 for use with the video encoder. The port configuration uses 1600 x 1200 for the resolution so it should be ok. Again, not tested but hopefully should work for you or at least point you in the right direction.

CobraCalle commented 4 years ago

thank you for your explanations.

I think I will implement the motion detection without a second stream and resize the frames on the opencv side. At the end processing the motiondetection with 30FPS is a little bit overhead. And resizing every frame (event with hardware acceleration) compared with only using every third / fourth frame and resize that frame in opencv I think that solution consumes less cpu ressources.

the reasons why I want to use my own motion detection:

the system has to adapt the changing light conditions throughout the day (daylight changes, moving window shutters, lights turned on and off etc.). If I understand the description of your motion detection correctly it only compares the frames against a single start frame.
I need the possibility to cut out some areas for the motion detection (for example tv screens etc.). and my motion detector does that after it has created a gaussian blurred, smaller version of the original frame, so I do not have to create a copy of the original frame to cut out this areas.

Question: Does your motion detection use any mmal methods that are processed on the gpu (if yes, it could make sense to use your detector and cut unwanted motion out of the result. If your algorythm runs on the cpu like my opencv implementation than it would make morse sense to reuse my own motion detector.

One last question (for now :-) ): If I understand your approach correctly, you suggest to just stop the processing of new frames from the "large resolution stream" when no motion is detected... correct? But this would mean that the whole processing of those frames (copying the data to ram etc.) still takes place... I hoped that there is a solution to "really" start and stop the capturing "inside" of mmal to pause the processing on the gpu, when the large frames are not needed (no motion).

techyian commented 4 years ago

The motion detection built into the library was more of a learning exercise for myself, it runs on the CPU and will only work with RGB pixel format varieties so I don't think it will be suitable for you.

The behaviour you're requesting isn't something I've investigated personally so I give no guarantees you will be able to achieve it with this library and I won't fundamentally change the way it works to satisfy certain requirements, mainly because I feel the way it is designed currently offers stability and as much flexibility as possible. If you have suggestions on ways in which it can be improved, please tell me and I will consider them.

MMAL ports can be enabled and disabled at any time which may be what you're looking for by calling either Start or DisablePort against an IOutputPort instance. I've checked the source and there may be issues if you do this ad-hoc but if you want to give it a try, feel free.

Which version of the Raspberry Pi are you using for your project and what is the desired framerate? It sounds like you're getting quite good performance already?

CobraCalle commented 4 years ago

Im using an rpi4 with 4gb. the framerate itself it very good now, but I have to run motiondetection and speech detection in parallel and the goal is to hold the cpu temperature well under 80 degrees (wich is a bit tricky because the pi sits in a disclosure, that is not quite optimal in terms of cooling at the moment. At the moment Im adapting the camera to the rpi-cam... I`ll keep you informed... :-)

techyian commented 3 years ago

Hi, are we ok to close this ticket now? I may add what has been discussed here to a new "Optimising Performance" page on the wiki as it may be helpful to others wishing to remove the allocation when obtaining the native image data.

CobraCalle commented 2 years ago

Hi,

I´ve implemented the motion detection now with a second video stream, (preview port that has a hardware accelerated resizer connected). This is used for the motion detection. When motion is detected, the part with the motion is cropped out of the larger stream.

The previous implementation, used opencv to scale the image down before checking for motion to speed things up. But scaling down with opencv results in a relatively high cpu usage, so my hope was to squeeze out a higher frame rate (or a least save some cpu resources), by using the hardware accelerated resizer from MMAL. But interestingly the new solution that uses two video streams results in a much lower frame rate (10 compared to 17-20 frames per second).

Does this make sense to you? Would it make sense to not use the preview port and use a resizer connected to a second splitter output instead?

Another option would be to switch to your motion detection implementation. But I need the area where the motions has occured. Because the motion detection triggers a face detection / recognition, and I would like to feed only the smal area that has changed to the face detector instead of the fulls frame (for performance reasons on the one side and the fact that I´m using 180° fisheye lenses on the other side, and that some warping required to rotate and stretch the image before I can feed it to the face recognizer).