Creating a frame-based pipeline "filter"

techyian / MMALSharp

C# wrapper to Broadcom's MMAL with an API to the Raspberry Pi camera.

MIT License

195 stars 33 forks source link

Creating a frame-based pipeline "filter" #176

Open MV10 opened 3 years ago

MV10 commented 3 years ago

In #172 you suggested that a Connection callback handler might allow me to write a "filter" in the middle of the pipeline, such as:

camera -> resizer -> (FILTER) -> h.264 encoder -> video capture handler -> h.264 file

Given your warning this is new and somewhat uncharted territory, I thought I'd open a dedicated issue.

You mentioned that MMALConnectionImpl.NativeConnectionCallback will only invoke InputCallback because of that if-else block. Could the fix be as simple as pulling the output callback out of the else block? In other words, should it check the input queue and invoke the callback if there is data, then do the same for the output queue and callback?

Assuming both input and output are being invoked, is this basically the flow for a derived Callback handler?

wait for InputCallback to be invoked
pass buffer to the base class (only debug logging, but...)
call GetBufferdata, read AssertProperty etc.
do things with the buffer data, store results locally
wait for OutputCallback to be invoked
call ReadIntoBuffer to populate any locally stored buffer data
pass buffer to the base class

I suppose I'm a little confused about what the port SendBuffer calls do with the buffer data in NativeConnectionCallback. Is that because this type of component can have multiple inputs and outputs? (Can anything have multiple inputs? I don't think I've seen that in any examples.)

Since you mentioned you aren't sure of the exact performance hit, I thought I would start with a simple pass-through and hack in something to display the raw ms/frame overhead at the end of processing. I'm going to try the above while waiting for your feedback. If that doesn't burn the house down, I'll try something more interesting.

MV10 commented 3 years ago

Annnnd I think everything I wrote above is probably wrong, lol.

Apparently NativeConnectionCallback is also invoked for hardware-based inputs/outputs, which I didn't expect, so if I manage to get it right, we can compare pure native overhead (crazy fast) to managed callback overhead. This is my output below, that first one is the encoder (so 481 calls to NativeConnectionCallback, most of which never took even 1ms, 241 calls to the resizer InputCallback which makes sense at 24FPS for 10 seconds). The second set of timings is the resizer, it doesn't seem to ever be called at all...? (If you read this from email, I originally misinterpreted those.)

How the test is set up:

using (var capture = new VideoStreamCaptureHandler(rawPathname))
using (var resizer = new MMALIspComponent())
using (var encoder = new MMALVideoEncoder())
{
    var camConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420);
    var rawConfig = new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480);
    var h264Config = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, quality: 10, bitrate: MMALVideoEncoder.MaxBitrateLevel4);

    resizer.ConfigureInputPort(camConfig, cam.Camera.VideoPort, null);
    encoder.ConfigureInputPort(rawConfig, null);
    resizer.ConfigureOutputPort(rawConfig, null);
    encoder.ConfigureOutputPort(h264Config, capture);

    var connection = resizer.Outputs[0].ConnectTo(encoder, useCallback: true);
    connection.RegisterCallbackHandler(new RawFrameCallbackFilter(connection));

    cam.Camera.VideoPort.ConnectTo(resizer);

    await Task.Delay(2000);
    var cts = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
    await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
}

Changes to that if-else block in MMALSharp sans timing and counters (which didn't break the resizer, I noticed):

protected virtual int NativeConnectionCallback(MMAL_CONNECTION_T* connection)
{
    if (MMALCameraConfig.Debug)
    {
        MMALLog.Logger.LogDebug("Inside native connection callback");
    }

    var queue = new MMALQueueImpl(connection->Queue);
    var bufferImpl = queue.GetBuffer();

    if (bufferImpl.CheckState())
    {
        if (MMALCameraConfig.Debug)
        {
            bufferImpl.PrintProperties();
        }

        if (bufferImpl.Length > 0)
        {
            this.CallbackHandler.InputCallback(bufferImpl);
        }

        this.InputPort.SendBuffer(bufferImpl);
    }
    else
    {
        MMALLog.Logger.LogInformation("Connection callback input buffer could not be obtained");
    }

    queue = new MMALQueueImpl(connection->Pool->Queue);
    bufferImpl = queue.GetBuffer();

    if (bufferImpl.CheckState())
    {
        if (MMALCameraConfig.Debug)
        {
            bufferImpl.PrintProperties();
        }

        if (bufferImpl.Length > 0)
        {
            this.CallbackHandler.OutputCallback(bufferImpl);
        }

        this.OutputPort.SendBuffer(bufferImpl);
    }
    else
    {
        MMALLog.Logger.LogInformation("Connection callback output buffer could not be obtained");
    }

    return (int)connection->Flags;
}

And finally, my do-nothing Callback handler:

public class RawFrameCallbackFilter : ConnectionCallbackHandler
{
    public RawFrameCallbackFilter(IConnection connection)
        : base(connection)
    { }

    public override void InputCallback(IBuffer buffer)
    {
        base.InputCallback(buffer);
    }

    public override void OutputCallback(IBuffer buffer)
    {
        base.OutputCallback(buffer);
    }
}

I'll keep poking at it, but as we say, ELI5 ... Explain Like I'm 5. 😁

MV10 commented 3 years ago

Backtracking from that test I tried, I see the resizer (well, I used ISP) by default sets up a general InputPort and outputs to StillPort. Further down the stack the ConnectTo is handled in OutputPort (the base for StillPort) and unless I'm mistaken, it's that version of NativeOutputPortCallback which ought to be invoking my pass-through handler (and is not, for some reason).

It doesn't look like MMALConnectionImpl.NativeConnectionCallback is ever aware my component was registered.

techyian commented 3 years ago

You mentioned that MMALConnectionImpl.NativeConnectionCallback will only invoke InputCallback because of that if-else block. Could the fix be as simple as pulling the output callback out of the else block? In other words, should it check the input queue and invoke the callback if there is data, then do the same for the output queue and callback?

To be clear, in the Connection callback the this.InputPort is referring to the downstream component's input port (in your example that will be the video encoder) and the this.OutputPort is the upstream component's output port. To be honest, I'm not even sure the Output port should have any buffers sent to it from the connection callback as that should all be handled internally so I may end up removing that block of code.

Backtracking from that test I tried, I see the resizer (well, I used ISP) by default sets up a general InputPort and outputs to StillPort. Further down the stack the ConnectTo is handled in OutputPort (the base for StillPort) and unless I'm mistaken, it's that version of NativeOutputPortCallback which ought to be invoking my pass-through handler (and is not, for some reason).

This is confusing me a little. You are mentioning NativeOutputPortCallback but we're talking about connection callbacks here, not port callbacks, they're two different things. The connection callback will be aware of the Upstream component and the Downstream component that the connection represents and the IBuffer instance you receive will be a representation of the data once it's been processed by your MMALIspComponent.

Are you saying that the connection callback handler you configured didn't get called in its InputCallback method?

techyian commented 3 years ago

Assuming both input and output are being invoked, is this basically the flow for a derived Callback handler?

When you ask for connection callbacks to be enabled the following should happen:

1) Native Connection callback method is hit 2) You can now intercept the data at this point. At the very least, you need to send the buffer to the Input port of the Downstream component, otherwise your flow will be broken. 3) The buffer you've sent will be processed by the Downstream component. 4) The Downstream component's native output port method will be called with the result of the data.

MV10 commented 3 years ago

To be clear, in the Connection callback the this.InputPort is referring to the downstream component's input port (in your example that will be the video encoder) and the this.OutputPort is the upstream component's output port. To be honest, I'm not even sure the Output port should have any buffers sent to it from the connection callback as that should all be handled internally so I may end up removing that block of code.

Ah! That makes sense. Yes it seems weird to be sending data back to something's output port. So in order to process raw full frames, would my usage pattern then be:

wait for InputCallback to be invoked
pass buffer to the base
call GetBufferdata, store results locally
read AssertProperty to check for EOS
do things with the full frame buffer
call ReadIntoBuffer to output a modified frame

This is confusing me a little.

I'm sure it isn't you who is confused! Sorry about that, I see what you mean, I missed the output-vs-connection difference in the method names. I was just trying to see what happened after RegisterCallbackHandler and jumped straight down to that other reference in the output port class.

Are you saying that the connection callback handler you configured didn't get called in its InputCallback method?

I don't think it was, the input-calls counter is incremented just before invoking InputCallback, and this is the resizer -- but the resizer does run, the output is 640x480. Maybe it's because I'm resizing with the ISP? But if I swap in the actual resizer component it fails to initialize.

if (bufferImpl.Length > 0)
{
    inputcalls++;
    timer.Restart();
    this.CallbackHandler.InputCallback(bufferImpl);
    timer.Stop();
    elapsed += timer.ElapsedMilliseconds;
}

MV10 commented 3 years ago

You can now intercept the data at this point. At the very least, you need to send the buffer to the Input port of the Downstream component, otherwise your flow will be broken.

So there's no way to buffer until you have a full frame anywhere but the very end of the pipeline? Will Bad Things Happen if I just stall (so to speak) until I've collected a full frame? (If that's even possible... empty the buffer?)

To be honest, I'm not even sure the Output port should have any buffers sent to it from the connection callback as that should all be handled internally so I may end up removing that block of code.

Figured it out below... I assume you're referring to the connection callback block of code only, I noticed NativeConnectionCallback switches back and forth between input and output buffers, and if I get rid of anything else in the output section but the callback, the program hangs. I thought that was sort of interesting, I wonder what goes back in the other direction. (If I'm correctly interpreting those MMAL doc links you provided, it looks like maybe the output pass grabs a buffer from the pre-allocated pool and sends that upstream, and that buffer is what comes back with data once the upstream has something available? Seems like a weird mechanism.)

MV10 commented 3 years ago

Ah, now I see, I found the queues and pools docs. The connection owns a pool of buffers, and also a queue of populated buffers received from upstream. When the populated buffer queue is empty, a new empty buffer from the local pool is passed back upstream to the upstream's (confusingly-named) output port. (Realistically I'm guessing 640x480x24bpp is a full-frame per buffer, since my tests made it look like it flip-flops between input and output, but it's obviously unwise to code to that assumption or they wouldn't need a queue mechanism.)

My initial thought is that the callback handler could simply return a flag about whether to pass the buffer to the downstream port, but since that is also buffer-based, it looks like the handler would have to store copies of the buffers (I don't think it would be safe to simply accumulate references to the actual buffers over multiple passes?), then at frame-end assemble them into the bitmap, process it, then disassemble back into buffer-sized copies, then actually output buffers to the downstream.

So maybe what gets returned from the callback handler is either null (indicating there's nothing to pass downstream yet), or an array of (copied?) buffers including headers, then NativeConnectionCallback could loop over that list pulling actual buffers from the pool, populating them (set header fields, call ReadIntoBuffer), and sending them downstream?

Sounds too easy.

And there is also the question of why my InputCallback is never invoked.

techyian commented 3 years ago

And there is also the question of why my InputCallback is never invoked.

I've just copied your code and it's because you're using the MMALIspComponent, I've replaced it with the Resizer and it's working. As to why that is I'm unsure, but the Isp component wraps the Image Sensor Processor block on the GPU so that could explain it and doesn't follow the same code paths natively that the other components do. I feel as though we're going down a rabbit hole here with the connection callbacks and I may need to have a think as to how we might incorporate Managed Components into the library to fulfil the requirement - I'm not promising we'll get anywhere with it though.

MV10 commented 3 years ago

I suspected that was the case. How do you get the resizer to work? Whatever I try fails to init:

mmal: mmal_vc_port_info_set: failed to set port info (3:0): EINVAL
mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0x10cfa70 (EINVAL)
mmal: mmal_vc_port_info_set: failed to set port info (3:0): EINVAL
mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0x10cfa70 (EINVAL)
Exception: Argument is invalid. vc.ril.resize:out:0(RGB3): Unable to commit port changes.

techyian commented 3 years ago

It accepts YUV420, RGB16 or RGBA32 pixel formats, try changing that - not sure how a RGB16 pixel format would fit with the motion detection work?

MV10 commented 3 years ago

Interesting. It should work exactly the same with RGBA32, the buffers and stride will just be larger. The alpha channel is irrelevant -- seems weird that it's even an option, it isn't like the camera can do anything with transparency... (magic!).

Generally though, motion detection / analysis aside, this should lead to general purpose in-line video filtering, if it works. Thanks!

MV10 commented 3 years ago

ISP does work -- I was misinterpreting the output of the connection Name property.

But I've had difficulty getting resizer to work int he past -- I assume RGBA32 is what config calls RGBA.

It reports an error, but then it works anyway...?

Initializing...
Preparing pipeline...
mmal: mmal_vc_port_info_set: failed to set port info (2:0): EINVAL
mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0x214bdb0 (EINVAL)
mmal: mmal_vc_port_send: format not set on port 0x214bdb0
Camera warmup...
Filtering 10 seconds of video to /media/ramdisk/filtered.h264...

Connection name: vc.ril.resize:out:0/vc.ril.video_encode:in:0
   0 ms average for 242 InputCallbacks (total elapsed 0 ms)
   241 OutputPort.SendBuffers (not timed)
   484 NativeConnectionCallback passes

Connection name: vc.ril.camera:out:1/vc.ril.resize:in:0
   0 ms average for 0 InputCallbacks (total elapsed 0 ms)
   0 OutputPort.SendBuffers (not timed)
   0 NativeConnectionCallback passes
Exiting.

using (var capture = new VideoStreamCaptureHandler(rawPathname))
//using (var resizer = new MMALIspComponent())
using (var resizer = new MMALResizerComponent())
using (var encoder = new MMALVideoEncoder())
{
    var camConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420);
    var rawConfig = new MMALPortConfig(MMALEncoding.RGBA, MMALEncoding.RGBA, width: 640, height: 480);
    var h264Config = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, quality: 10, bitrate: MMALVideoEncoder.MaxBitrateLevel4);

    resizer.ConfigureInputPort(camConfig, cam.Camera.VideoPort, null);
    encoder.ConfigureInputPort(rawConfig, null);
    resizer.ConfigureOutputPort(rawConfig, null);
    encoder.ConfigureOutputPort(h264Config, capture);

    var connection = resizer.Outputs[0].ConnectTo(encoder, useCallback: true);
    connection.RegisterCallbackHandler(new RawFrameCallbackFilter(connection));

    cam.Camera.VideoPort.ConnectTo(resizer);

MV10 commented 3 years ago

Kind of interesting, I bumped the resolution up to 1920x1080 hoping that each frame would require more than 1 buffer -- it did, and attempting to store the real buffers (rather than copies) yields an out of memory error. (I did try properly calling the ref-count acquire/release methods.) I suspected it wouldn't work but it was the fastest and easiest thing to try -- but I didn't expect an out of memory error. Next I'll try copying them, although my guess is that it has to be passed to an input port to be de-allocated back to the pool queue, which means this won't work (which is a shame, this would be a really easy way to add in-line video FX, as well as the CCTV stuff I'm trying to do).

Filtering 10 seconds of video to /media/ramdisk/filtered.h264...
Processing 2 buffers
mmal: mmal_port_send_buffer: vc.ril.video_encode:in:0(RGB3): send failed: ENOMEM
Unhandled exception. MMALSharp.MMALNoMemoryException: Out of memory. vc.ril.video_encode:in:0(RGB3): Unable to send buffer header.
   at MMALSharp.MMALNativeExceptionHelper.MMALCheck(MMAL_STATUS_T status, String message)
   at MMALSharp.Ports.PortBase`1.SendBuffer(IBuffer buffer)
   at MMALSharp.MMALConnectionImpl.NativeConnectionCallback(MMAL_CONNECTION_T* connection)
Aborted

techyian commented 3 years ago

How are you storing the buffers? As an array of IBuffer objects or similar? The MMALPortConfig object allows you to override the number and size of buffers allocated to a Queue, so you may want to play around with that. Additionally, have you looked at the Acquire() method against an IBuffer object to increment the reference count on it? That should prevent it from being recycled, I quote the API docs:

Acquire a buffer header. Acquiring a buffer header increases a reference counter on it and makes sure that the buffer header won't be recycled until all the references to it are gone. This is useful for instance if a component needs to return a buffer header but still needs access to it for some internal processing (e.g. reference frames in video codecs).

I'm not sure if a combination of these two things can help progress it further but I'm just thinking out loud here more than anything. You need to be careful with incrementing the reference count on buffers though, I'm hoping that DestroyPortPool will handle it nicely for you but if not, your app won't close cleanly.

MV10 commented 3 years ago

Yes, just a List<IBuffer> in the handler. I had wondered if the buffer pool could be altered, so you're a step ahead of me with that point -- I sort of wonder if the default pool only allocates a single buffer.

I know ref counts are sensitive, I actually found MMALSharp through your conversations the Raspberry Pi forum back when you were starting work on the library. I'm using those methods. Specifically:

NativeConnectionCallback calls my InputCallback (modified to return List<IBuffer>)
if not EOS, call Acquire, add buffer to internal list, return null
upon receiving null, NativeConnectionCallback calls Release and exits
if EOS, process the buffer list (just RGB inversion for simplicity), return buffer list
upon receiving a list, each buffer is passed to InputPort.SendBuffer

But it doesn't work, so I'll try copying:

NativeConnectionCallback calls my InputCallback (modified to return List<IBuffer>)
if not EOS, copy buffer to internal list, return null
upon receiving null, NativeConnectionCallback calls Release and exits
if EOS, process the buffer list (just RGB inversion for simplicity), return buffer list
upon receiving a list, request a buffer from the pool, copy from the list, send to InputPort.SendBuffer

Another idea was to set up an additional "processed buffers" queue and push into that, and let NativeConnectionCallback process that before the upstream queue. However, I think a secondary queue will fail (sort of) because I believe the whole call chain operates in lock-step -- the original source of buffer data drives the sequence, so it would lose whatever number of new upstream buffers (1 frame's worth, 2 buffers, in my testing) off the end of the sequence, as this secondary queue would "hijack" the calls.

I'm probably about to seriously anger the Hardware Gods.

techyian commented 3 years ago

I sort of wonder if the default pool only allocates a single buffer

Buffer count is handled by the Port's BufferNum/BufferNumMin/BufferNumRecommended, see here, but as mentioned before this can be overridden via config.

Another idea was to set up an additional "processed buffers" queue and push into that

By all means, create your own IPort class and add another IBufferPool to it, then you have your own Pool to work with independent of the main working Pool/Queue, making sure to register your new class via ConfigureInputPort<TPort> /ConfigureOutputPort<TPort>. There is a native function mmal_buffer_header_replicate that you could use to copy into a new buffer, it's not been added to an IBuffer class yet but you can add that locally for testing, see here. If you make progress with it, I'll add a ticket to add it into the managed code.

You're in charge of your new Pool and need to make sure you destroy it on tear-down (this probably calls for DestroyPortPool to be virtual and then you can cleanly override to handle your buffers?), but again, test it first then we can add publicly later.

MV10 commented 3 years ago

I'm constantly amazed at the amount of work you've put into this library!

techyian commented 3 years ago

Thanks Jon. The underlying MMAL library is a pretty awesome bit of kit. I think the stuff you're working on will really help people wanting to push it to it's limits.

MV10 commented 3 years ago

One thing I don't understand is how my list stored two buffers -- I see in the debug log that the pool does default to just one buffer (Creating buffer pool with 1 buffers of size 65536). I suppose it was the same buffer twice (and probably lots of corrupt data somewhere). I'd have thought storing it and not passing it on would have kept it unavailable for re-use.

Regardless, I think I've hit a dead-end for reasonably handing raw full-frame data via Connection callbacks. I can't "manually" restore a copy of IBuffer header properties (they're all read-only pointer de-refs), using the native copy implies acquiring new buffers (which I think only works through pools), and there's still the lock-step-sequence data-loss issue I mentioned earlier. But I can at least report that:

Connection callbacks work for simple on-the-fly raw-image alteration which doesn't require full frame data. The overhead didn't appear to be a problem even at 1080p followed by h.264 encoding. If you ever want a wiki example, we could add something like on-the-fly conversion to grayscale, for example. Any simple, minimum-state, pixel-level FX should be possible, like film grain effects, perhaps -- the value being that this still allows downstream encoding, compared to a end-of-pipeline capture handler.
You were right, OutputCallback doesn't make sense on a connection handler, the buffer has no data when retrieved from the connection's pool, and the buffer isn't being sent upstream for processing, it's sent to be populated and returned via the queue. I'll PR that change, if you like.

I do see that it's possible to calculate frame metrics (stride etc.) from the outset, so I could still (theoretically) use a buffer-only Connection callback to output my motion detection analysis (or even do actual motion detection), although the parallel processing aspect would be a migraine-inducing problem to solve (I assume the partial frame data in the buffers isn't even guaranteed to have full rows). There would be quite a lot of state to track between invocations, too, I think. But that's edging into "Hold my beer, watch this," unnecessary-complexity territory. I think I'll set that idea aside.

I've spent the past couple hours studying PortBase and InputPort, but I think a port would also be subject to the lock-step problem. I might be wrong. I need to think about it a bit more.

I keep coming back to components like encoders -- something between input and output ports. But from what I read in the component list in the MMAL docs and the way the constructor for MMALComponentBase works, I'm guessing that a "software component" that owns an input and output port just isn't possible.

MV10 commented 3 years ago

I've been working on an InputPort-like implementation on and off today. If it works, I think I can actually move motion detection (and analysis) there without changing the other motion detection changes I just PR'd. I saw the relationship between IBuffer and ImageContext and realized I can leverage existing processing classes like FrameAnalyser.Apply, for example. So if this works, if someone doesn't need capture handler behaviors during motion detection, you can just hook up the camera or some other raw source to a motion detection input port, then do whatever further down the pipeline with the frames that it outputs.

And I'm trying to make it generalized so that it could be used for FX filtering.

I'm still early in this work, not sure what would happen with MMALCamera.WithMotionDetection yet, if anything, but it feels like it's going in the right direction. Oh and I don't think this suffers from what I called the lock-step problem.

MV10 commented 3 years ago

The ref-count behavior of mmal_buffer_header_replicate puts an abrupt end to my plan to store copies:

Replicating a buffer header will not only do an exact copy of all the public fields of the buffer header (including data and alloc_size), but it will also acquire a reference to the source buffer header which will only be released once the replicate has been released.

I assume they do that because both copies point to the same payload data, but I don't see how it could be useful (natively, but also in MMALSharp). If you already have the source buffer, why not use that reference? Seems weird to me. Oh well, learned a lot, in any case.

I want to try using the PortBase buffer pool with a larger number of buffers, and I see the BufferNumber config property, but it's read-only. It looks to me that connected input/output ports are supposed to have matching pool sizes, so I'm unsure whether it's safe to just resize it.

I'm also wondering whether I should temporarily disable the component's output port(s) while I accumulate frame buffers. The docs for port enable/disable also say those calls will release and reallocate pools on both sides of the connection. I guess I'd also have to override the enable method to re-resize (ha) my pool (in case some other process disables my frame buffering thing, and assuming I can't figure out how to change BufferNumber).

Oh ... of course ... buffer number is in the constructor. Alrighty then...

techyian commented 3 years ago

The ref-count behavior of mmal_buffer_header_replicate puts an abrupt end to my plan to store copies:

I wasn't aware of that. I'm not sure how fast it would be to retrieve the data from one buffer via the IBuffer's GetBufferData() method, then populate your temp buffer with ReadIntoBuffer(), all you're interested in is the data really isn't it?

I want to try using the PortBase buffer pool with a larger number of buffers, and I see the BufferNumber config property, but it's read-only. It looks to me that connected input/output ports are supposed to have matching pool sizes, so I'm unsure whether it's safe to just resize it.

MMALPortConfig accepts a parameter bufferNum and bufferSize which allows you to override them (just seen your edit). As you say, it makes sense to set the same values against the input/output ports, but there will be a upper threshold where more buffers does not equal better performance.

I'm also wondering whether I should temporarily disable the component's output port(s) while I accumulate frame buffers. The docs for port enable/disable also say those calls will release and reallocate pools on both sides of the connection. I guess I'd also have to override the enable method to re-resize (ha) my pool (in case some other process disables my frame buffering thing, and assuming I can't figure out how to change BufferNumber).

Hmm I'm not sure about this. It would also disable the input port of the connected component. I don't know what kind of performance overhead is involved in disabling/enabling ports.

MV10 commented 3 years ago

I wasn't aware of that. I'm not sure how fast it would be to retrieve the data from one buffer via the IBuffer's GetBufferData() method, then populate your temp buffer with ReadIntoBuffer(), all you're interested in is the data really isn't it?

That's what I'm doing, although there is no temporary buffer. I set aside each buffer in a local list until I have a full frame, then I GetBufferData from all of those into a frame byte array for processing, then break it down again into the original buffers, then they're sent downstream.

MMALPortConfig accepts a parameter bufferNum and bufferSize which allows you to override them (just seen your edit). As you say, it makes sense to set the same values against the input/output ports, but there will be a upper threshold where more buffers does not equal better performance.

It's not for performance, it's because one frame at higher resolutions may span multiple buffers. When I tried this yesterday without increasing the pool size, my local buffer caching stored two copies of the same buffer. So unless I misunderstand how this works, I need a pool that is at least one buffer larger than whatever it takes to represent a full frame.

I don't know what kind of performance overhead is involved in disabling/enabling ports.

Great point. At the frequency I'm talking about (per frame!) I'm sure this would be an extremely bad idea.

I'm pretty close to being able to run another test. Fingers crossed!

MV10 commented 3 years ago

Ian, do you know how the port BufferNumMin is set or controlled? It looks like it must come from somewhere inside MMAL, but if I decide, for example, that four buffers are my safe minimum for this derived port class, it would be nice if I could indicate that somehow. Help users "fall into the pit of success" as the saying goes.

Right now I'm throwing if the provided config has fewer than I need. I haven't done any exhaustive testing but the couple of checks I did always default to a pool size of 1. I'm also wondering if my minimum ought to be a multiple of the default minimum.

Edit - thinking about it more, I suppose it's probably calculated based on resolution.

But I should make sure it works at all before worrying about that. :)

techyian commented 3 years ago

BufferNumMin comes from MMAL itself and is populated based on your port configuration. You will notice when configuring a port that there are two calls to Commit and using OutputPort as an example, the code is here and here. The first call to Commit will set the port configuration you've set in your MMALPortConfig object, and at this stage, MMAL will internally set the values of BufferNumMin and BufferNumRecommended. If you try to commit a number less than BufferNumMin it'll either throw an exception or won't honour your request (I'm not 100% sure which off the top of my head).

Typically BufferNumMin will just be 1, because MMAL can get by with just sending a single buffer to-and-fro, but it won't necessarily be the most performant number of buffers.

MV10 commented 3 years ago

Thank you for clarifying, that's very helpful.

Is the hardware pipeline ever async? Even at 1920 x 1080 (mode 2, v1 camera), all of those buffer parameters (evenBufferNumRecommended) are always 1 in my testing. But I can't see where multiple buffers would help (in the normal pipeline, ignoring what I'm trying to do). It looks to me like the camera spits one out, and it works its way through the system synchronously.

Something else I'm confused about -- in the debug log I'm seeing FRAME_END every 8 buffers or so. At 65K that's only about a tenth the size of 1080 at RGB24. I expected to see about 95 buffers. (I suppose whatever logs the buffers doesn't log all of them?) Edit: I remembered the PrintPipeline command and I see that whatever gets logged as "Length" for each buffer is not the buffer size (which in that case was 626880).

The other thing I can't figure out is why my overrides aren't actually executing. I have log messages in everything. I see the constructor called, I see configuration called -- then nothing. I get video but it's like my port is bypassed completely. I know the component is using my port, just before I start processing I dumped the encoder.Inputs[0] type name. I checked the enabled flag on it, too -- true.

But hey, it doesn't crash! (yet)

I'm using this, or the splitter when I test at 1920x1080, or even a splitter outputting to a resizer then h.264 encoder, and also directly to a full-size h.264 encoder -- it all works, but none of my code seems to run.

using (var capture = new VideoStreamCaptureHandler(h264Pathname))
using (var resizer = new MMALIspComponent())
using (var encoder = new MMALVideoEncoder())
{
    var camConfig = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420);
    var rawConfig = new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480);
    var h264Config = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, quality: 10, bitrate: MMALVideoEncoder.MaxBitrateLevel4);

    resizer.ConfigureInputPort(camConfig, cam.Camera.VideoPort, null);
    encoder.ConfigureInputPort<RawFrameInputPort>(rawConfig, null);  // <---- my input port 

    resizer.ConfigureOutputPort(rawConfig, null);
    encoder.ConfigureOutputPort(h264Config, capture);

    cam.Camera.VideoPort.ConnectTo(resizer);
    resizer.Outputs[0].ConnectTo(encoder);

    await Task.Delay(2000);
    var cts = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
    await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
}
cam.Cleanup();

MV10 commented 3 years ago

I'm guessing you're about to tell me I should have based this on OutputPort... 💥

MV10 commented 3 years ago

I've been doing some reading, apparently mmal_buffer_header_replicate is how MMAL "moves" buffers from a component's output port to the next component's input port (and, as you know, also through the pools of the connection in between them). Bearing in mind this involves interdependent ref-counts, I think this means my idea of temporarily accumulating buffers to build a full frame at the port level will not work. It would require every port and connection in the entire upstream pipeline to have enough "extra" buffers to keep feeding new data into the pipeline while my port has "stalled" the pipeline (interrupting the chain-reaction of buffer ref-count releases) waiting for a full frame.

I don't think it's necessarily impossible (replicate doesn't make new copies of the payload data, so it's not exponential growth of memory usage) but the complexity of configuring it correctly is starting to look unreasonable.

I believe a component like the h.264 encoder (which requires two full frames to perform P-frame compression) -- simply releases input port buffers upon arrival, and does not generate output port buffers until it has finished either accumulating an I-frame or compressing a P-frame.

Would you say my assumption is correct that a "software component" is not possible? Something which owns an input and output port, but is not an IL component? As written, the pipeline always maps a component to one of the "named" IL components but I'm not clear if that's strictly necessary. (Edit: It occurs to me that I don't see any MMAL functions to init a buffer -- by which I mean setting all the various header fields, so I suspect that's the "magic" that happens within a hardware-based component which makes a "software component" an unlikely option...?)

techyian commented 3 years ago

The other thing I can't figure out is why my overrides aren't actually executing. I have log messages in everything. I see the constructor called, I see configuration called -- then nothing.

Currently in MMALSharp, the only time you will receive callbacks to an input port is if you manually enabled it yourself and are supplying data to it manually, similar to the Standalone API. I have spoken previously about connection tunnelling and how this changes the behaviour of port callbacks, please see here regarding "Tunnelling connections" for more info, but basically it's used to efficiently communicate buffers in the pipeline without them being returned to the "client", i.e. us. When connection tunnelling is enabled (which is the default in MMALSharp), you will just receive your buffers on the final output port in your pipeline. I've tried turning tunnelling off but it doesn't work, so I need to investigate that further and #113 will track that. This is why I felt Connection callbacks were suitable for your requirements but I think you've since dismissed this as being plausible, however I'd like to quote a previous message:

Regardless, I think I've hit a dead-end for reasonably handing raw full-frame data via Connection callbacks. I can't "manually" restore a copy of IBuffer header properties (they're all read-only pointer de-refs), using the native copy implies acquiring new buffers (which I think only works through pools), and there's still the lock-step-sequence data-loss issue I mentioned earlier.

All Managed MMAL classes have a Ptr public property allowing you to set fields against the native structs, please see how I do this in the PortBase class as an example. If you really wanted to set your own values you can do, but please be cautious about what you do. If by "header properties" you're referring to the flags member, this is a bitwise value which can be accessed using this.Ptr->flags against an IBuffer instance. Is there a problem with having a pool of buffers in your connection class which is initialised with the same number of buffers being passed around the pipeline? These could then be manually populated and stored in your list? I think I'm struggling to visualise what it is you're doing without seeing the code, but I'd hoped Connection callbacks would have been useful to you.

Would you say my assumption is correct that a "software component" is not possible? Something which owns an input and output port, but is not an IL component? As written, the pipeline always maps a component to one of the "named" IL components but I'm not clear if that's strictly necessary. (Edit: It occurs to me that I don't see any MMAL functions to init a buffer -- by which I mean setting all the various header fields, so I suspect that's the "magic" that happens within a hardware-based component which makes a "software component" an unlikely option...?)

I'm not going to write this off completely as being possible. If we can get the library working without connection tunnelling then we should receive callbacks on each component's input and output ports, you could then detect if the current component is connected to a "software component" and take it from there. I will see if I can make any progress on #113.

MV10 commented 3 years ago

Very interesting.

Is there a problem with having a pool of buffers in your connection class which is initialised with the same number of buffers being passed around the pipeline? These could then be manually populated and stored in your list? I think I'm struggling to visualise what it is you're doing without seeing the code, but I'd hoped Connection callbacks would have been useful to you.

Essentially at one of these points (in a connection or as a new IPort), instead of sending the buffer to the next stage of processing, I was stashing it locally until I'd collected a full frame's worth of buffers. Then I'd assemble those buffers into a frame byte array, process it, then disassemble it back into the separate buffers. At that point I'd send them downstream.

But as I noted, I think the mmal_buffer_header_replicate ref-count behavior is still a problem, although I think this might be one of the things tunneling avoids. But I'll try to clarify what I meant. Consider a simple pipeline like camera, ISP resizer, encoder, handler, with my "pause the pipeline" frame-based operation as an InputPort subclass on the encoder (so it receives resized frames):

camera outputs a buffer
buffer is replicated to the connection to the resizer
buffer is replicated to the resizer input port
buffer is replicated to the connection to the encoder
buffer is replicated to the encoder input port (my code, in this example)

At that point, my code would store the buffer, there would really be five copies of the buffer headers, and then it would start again until my code receives and end-of-frame buffer. But the point is that each call to replicate keeps the source buffer ref-counted even if the original "owner" releases it. So when my code stores a buffer waiting for end-of-frame, all of those upstream components must have additional buffers in their pool to start that process again.

For my 1080p test, which generally required about 8 buffers for a full frame, that implies the camera, two connections, two input ports, and one output port all need pools of 8 buffers. That's what I meant about the configuration becoming unwieldy.

But a "software component" could change that by simply releasing the input port buffer when it is stored, and generating new output port buffers, then using the SendAllBuffers method (which I imagine is why that call exists). From the docs link you provided, it does sound as if software components are a supported scenario:

avoiding trips between the ARM and the VideoCore if both components are implemented on the VideoCore

I've already tossed out my changes (I'm trying to accomplish my goal with ExternalProcessCaptureHandler and a mildly painful bash/cat/ffmpeg/VLC pipe operation), but happily, if #113 allows that, the code is very easy to re-implement. And if nothing else, inline video filtering would be super cool!

MV10 commented 3 years ago

I'm not going to write this off completely as being possible. If we can get the library working without connection tunnelling then we should receive callbacks on each component's input and output ports, you could then detect if the current component is connected to a "software component" and take it from there. I will see if I can make any progress on #113.

I have high hopes that we (er, you 😄 ) can figure out the "software component" angle. Today I received a new wide-angle camera module. Image-distortion correction looks to be a relatively fast linear algorithm that might be a (tricky) parallel-processing candidate and it would be great as an early pipeline component.