Closed julianscheel closed 10 years ago
I don't think any omx components handle opaque formats.
I did start to make a MMAL wrapper for image_fx, but it was a bit more involved that I anticipated and I had to continue with other things. I hope to be able to get it working eventually.
Ok, I feared it would be that way. Just had some hope after spotting that mapping.
I've just had another thought on how to get opaque video data from mmal to omx to be able to use image_fx as long as it is not part of mmal. Therefore some questions:
I haven't tried, but you should be able to instantiate MMAL component "vc.ril.image_fx" and use it with non-opaque formats (I420, I422, RGBA32, and RGB565). You're right that it appears not to be have been converted to support MMAL_ENCODING_OPAQUE. It's a half-hour job if I can get a test case together - I'll have a look tomorrow.
Mixing MMAL and OMX opaque formats won't work. None of the OMX components actually support appear to support OMX_COLOR_FormatBRCMOpaque, and I'm not even 100% certain where it came from. OMX will automatically use a proprietary tunneling mechanism if two compatible ports are tunneled. MMAL_ENCODING_OPAQUE makes use of this, but in a way that allows the buffers to be extracted and handled as buffers rather than always having to be directly tunneled.
@6by9: This is amazing news. Regarding the testcase, I prepared something a few weeks ago for Gordon. I just uploaded it here: https://gist.github.com/julianscheel/d42d1c6cb967194a15a6 It's a simple decode->image_fx->vout chain, which is obviously untested but should work as is if image_fx in mmal supports the MMAL_ENCODING_OPAQUE. It does not set any parameters on the image_fx so it will use whatever filter is default. Maybe you can add some detail how to set the image filter type? Another question while at it: popcornmix added a paramter into the OMX image_fx module which allowed us to explicitly set the frame duration, so that it is not interpolated by the filter. Can we set this through mmal as well?
The non-opaque formats shall be working with the current rpi firmware and userland already? I'll try to get it up and running.
One more note to the demo code: You need to pass a h264 file (like https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/hello_pi/hello_video/test.h264) as first argument.
And I just changed that code to use MMAL_ENCODING_I420, but unfortunately it fails on mmal_port_format_commit for the image_fx component.
OK, not working with I420 etc is likely to be due to the component not setting up strides in the way MMAL expects - I've hit that before. I'll look tomorrow. Having that test case is perfect - it saves me fiddling with Raspistill and dropping image_fx between camera and video_render.
Extra parameter - easy to add. Will do so as it should be obvious from source control what it's called.
Deinterlacing - now you've got me worried. I have a recollection that we have an internal issue raised saying the deinterlacing filter doesn't support our optimised image format. No one has really cared as we haven't been actively using the deinterlace plugin for ages. Changing to MMAL_ENCODING_OPAQUE would be doing so, so I'd better check it. Even if that is an issue, I will fix image_fx to work with MMAL, and either input or output may need to be I420 rather than OPAQUE. I'll see what I find.
Ok, thanks for the details. I really hope that we get it working with opaque, as for the case with I420, which we have working with a mixed mmal->omx chain, we hit some performance limits with our VLC implementation, which we don't hit when running a pure mmal opaque chain (just without deinterlacing atm). So all this work was targeted to get the deinterlacing working along with opaque buffers. :)
My concern on deinterlacing was partially resolved. I was right there was an internal issue raised (only in Sep 2008 - that's how little the deinterlacing filter is used!). The algorithm ends up doing a format conversion internally, so it works, but is suboptimal. Popcornmix is better at vector assembler than me, so I'll leave it with him to investigate if viewed as necessary.
I've almost got image_fx working with MMAL. I'll keep you updated.
Ah, thanks for that update. I assume this will be good enough performance-wise as a tunnelled omx chain copes well with the performance. Looking forward to see your updates :)
I have it working with MMAL opaque, although I've only tested the simple invert effect. I have been testing with a hacked raspivid as your test didn't want to build and I didn't have the time to investigate.
A quick look at the extra parameter popcornmix added and it uses the same mechanism as all the others. It should map through to calling mmal_port_parameter_set(image_fx->output[0], ¶m), with param being a MMAL_PARAMETER_IMAGEFX_PARAMETERS_T and param.hdr.id = MMAL_PARAMETER_IMAGE_EFFECT_PARAMETERS. That also allows you to specify the effect (from the MMAL_PARAM_IMAGEFX_T enum - ah, that's missing the deinterlacing options! A small change needed there), and the extra params (specific to the effect chosen).
I've got to leave it for today, but will finish up on Monday.
Great news, thank you! Will test it as soon as you upload it. Have a nice weekend
@6by9 Regarding your compile issues with my sample, you probably just need some additional include paths. On raspbian I can build it succesfully with:
gcc -lbcm_host -lmmal -lmmal_core -lpthread -I/opt/vc/include -I/opt/vc/include/interface/vcos/pthreads/ -I/opt/vc/include/interface/vmcs_host/linux/ -L/opt/vc/lib -DDEINTERLACE -o mmal-demo mmal-demo.c
I just forgot to add those in the comment as I install rpi userland to /usr in my buildroot system.
Have you been able to finish it up yet?
Sorry, had some other pressing issues yesterday, and further issues with your test case (you never sent the buffers to the image_fx output port, so I had to debug why it all stalled).
I now get the first 6 frames through correctly, but then get MMAL errors (ENOMEM) on trying to send a buffer to the image_fx output port - more investigation needed there as to why the buffer recycling is failing. It all worked quite nicely for me when using a mmal_connection instead of manually handling the buffers, so I think it is just client code being wrong.
As I wrote the code without being able to test it, it's quite likely that something's wrong in there :) If you don't specify -DDEINTERLACE it should work though. Just without image_fx.
Regarding the ENOMEM: How is the image_fx acquiring new pictures? Is there a shared access to the pool of opaque images? Or is the image_fx module creating it's own? Maybe we just need to allocate some more buffers on the decoder, so the pool is big enough?
If you have a firmware drop for me, I'd be happy to help debugging the client side code.
In the tunnelled openmax case, image_fx does use the same pool of buffers that video_decode does. You need to call OMX_IndexParamBrcmExtraBuffers with parameter 3 to ensure there are enough buffers for image_fx.
Not sure if this is the case case with MMAL and opaque buffers.
Looks like I've got it all sorted. image_fx output port was configured with only 20 buffers instead of 40, so ended up being given more buffers than it was expecting (or something like), and then had a mismatch of userspace vs GPU buffers.
Image allocation is a little odd (I'm looking at this code for the first time in a long time!), and the person who wrote it left a while ago. It seems that all image effects are done out of place - it has access to the images the decoder creates on the input side, but then allocates a new image pool for the output images. That may be an issue for you as you had to do tweaks with OMX_IndexParamBrcmExtraBuffers on video_decode to get enough buffers for your display side. I'll just do a quick tweak to allow you to set OMX_IndexParamBrcmExtraBuffers / MMAL_PARAMETER_EXTRA_BUFFERS on image_fx as well (it defaults to 5 buffers, but you can now add some).
Whilst looking at this lot, I was just seeing if https://github.com/raspberrypi/firmware/issues/181 for OMX_ImageFilterDeInterlaceLineDouble was trivial. It almost looks it, but there are a couple of subtleties in there to overcome.
Ok. This sounds really promising. Although the fact that image_fx allocated it's own pictures, not taking those from the decoder picture pool frightens me a little. Might complicate things on the VLC side more than I had hoped.
OK, https://github.com/6by9/RPiTest/tree/master/mmal_image_fx has a test firmware, and your test app with a couple of fixes (I wasn't having a go as I know you couldn't test it. It was just the debugging was taking time).
Image pool allocation is as it is I'm afraid. image_fx appears to work out of place, so has to allocate a new set of image pools. VLC probably won't know as they are all referenced via opaque buffers. Setting the output of image_fx to MMAL_ENCODING_I420 would have almost no additional cost cf opaque as the deinterlace filter will write straight into the output buffer instead of the image pool. I know very little of how VLC hangs together, so don't know if that helps at all or not.
Thanks for the update. I'll get my hands on it tomorrow. Regarding the output to MMAL_ENCODING_I420: The image_fx filter is directly writing into the ARM buffer then? So no additional copy happens? Still I think it would have an impact on video_render, wouldn't it? At that point it will have to upload from ARM to GPU memory again. Anyway I will play around a little with it and see how the performance is when the deinterlacer is plugged in. I hope our CPU requirements will be satisfied with all opaque finally.
Firmware has been updated. Please run rpi-update and test.
The modified mmal-demo.c works well for me when using ENCODING_OPAQUE.
If I try to run it with ENCODING_I420 it fails though. To avoid out of memory errors I need to change the buffer_num: data->dec_output->buffer_num = data->dec_output->buffer_num_recommended;
But even when doing this I can't get it to work as it rejects the format on the image_fx component:
mmal: mmal_vc_port_info_set: failed to set port info (2:0): EINVAL
mmal: mmal_vc_port_set_format: mmal_vc_port_info_set failed 0x1744880 (EINVAL)
Failed to commit deinterlace intput format (status=3 EINVAL)
Any thoughts why this fails?
Besides this I have trouble with the buffer handling when plugging stuff into VLC. The decoder module in VLC sets buffer_num to 40 to have enough buffers available for all internal processing stuff. Now in the image_fx module I set buffer_num for input as well as output to 40 as well to match the decoder. When I start feeding image_fx with in and out buffers it starts to work, but it will only output 5 frames and then it stalls. I can increase the number of frames before it stalls by applying MMAL_PARAMETER_EXTRA_BUFFERS. If I set it to 5 it stalls after 10 frames, if I set it to 20 it stalls after 25 and so on. Only when I set it to some insanely high value like 35 it does not stop. Do you have any thoughts what goes on there or what might actually cause the image_fx to block when PARAMETER_EXTRA_BUFFERS is lower?
I was modifying the test app for ENCODING_OPAQUE only, I wasn't trying I420. I420 requires (width_height_1.5) bytes for each buffer, cf 64bytes per buffer for OPAQUE. That'll run out of memory very fast if using 40 buffers on each port. The reason for the rejection is the buffer size is being mishandled internally and is insufficient for the size of image being requested. I'll need to look at the required behaviour again and correct this.
Opaque buffers are just references to the internal image pool. The image is only returned to the pool for reuse when the buffer is returned to the output port that produced it. There are 5 buffers in the pool plus the number specified by MMAL_PARAMETER_EXTRA_BUFFERS. Once those are filled, then image_fx has to stall until one is returned.
How many buffers is VLC holding on to, and for how long? If it doesn't return them then the video_decoder would have the same issue, and I thought you overcame that.
Actually in the decoder we had the same freezes and could solve them with MMAL_PARAMETER_EXTRA_BUFFERS of 20. Honestly I don't really understand why we need 20 extra buffers there. In theory way less shall be sufficient.
For the image_fx things confuse me, because the buffers shall be returned to image_fx already, when it freezes. Actually the image is rendered almost immediately after passing through the filter and then released:
[b694d020] mmal_deinterlace filter debug: output pts 919473983
[b693e330] mmal_vout vout display debug: output pts 919473983 released
[b694d020] mmal_deinterlace filter debug: output pts 919513983
[b693e330] mmal_vout vout display debug: output pts 919513983 released
[b694d020] mmal_deinterlace filter debug: output pts 919593983
[b694d020] mmal_deinterlace filter debug: output pts 919633983
[b693e330] mmal_vout vout display debug: output pts 919593983 released
[b693e330] mmal_vout vout display debug: output pts 919633983 released
[b694d020] mmal_deinterlace filter debug: output pts 919753983
[b694d020] mmal_deinterlace filter debug: output pts 919793983
[b693e330] mmal_vout vout display debug: output pts 919753983 released
[b693e330] mmal_vout vout display debug: output pts 919793983 released
[b694d020] mmal_deinterlace filter debug: output pts 919913983
[b694d020] mmal_deinterlace filter debug: output pts 919953983
[b693e330] mmal_vout vout display debug: output pts 919913983 released
[b693e330] mmal_vout vout display debug: output pts 919953983 released
[b694d020] mmal_deinterlace filter debug: output pts 920073983
[b694d020] mmal_deinterlace filter debug: output pts 920113983
[b693e330] mmal_vout vout display debug: output pts 920073983 released
[b693e330] mmal_vout vout display debug: output pts 920113983 released
You can see that pictures are actually released before the next frame is output by the deinterlace filter it seems. But after 10 frames it stalls (extra buffers are set to 5 in this case).
When you say that image_fx waits for the buffers to be returned: How is the return of an image actually handled? Is it done somehow implicitly when the video_renderer has received it? As I don't think we can release the actual image explicitly, only the buffer headers holding it, but they don't seem to be tied hard?
I'd forgotten Pi is on a slightly older buffer management scheme to our main dev branch.
The image pools have internal refcounting, and anything that can get a handle on the image (ie image_fx, MMAL, and video_render) can increment and decrement that refcount to indicate that they still want that image and when they're done with it. Because of that, video_render returns the buffer immediately having incremented the refcount on the underlying image. MMAL will have incremented the refcount when it was given the image, and releases it when it is returned to the output port. Image_fx will have incremented the count on allocation, and decremented it as soon as it has successfully passed the image to MMAL.
"sudo vcdbg pools" will show the current state of the image pools and the refcounts of each image. I suspect that each image will be stuck with a refcount of 1 due to something still holding the image.
Ok, this makes things a bit more clear to me. It is indeed as you guessed and the refcount of the pictures is still at one in that case. See here: https://gist.github.com/julianscheel/4a6a9f2411dec0f951a8 Now as the last release came from ILVRender it is obvious that the frame was actually rendered and I would have expected that it gets freed after it. But still someone (mmal worker?) holds a reference on it... Any ideas how to figure why that reference is still held and how to release it?
If you've done the mmal_port_send_buffer(deinterlace->output[0], buffer) with the buffer that has been released, then I don't really know. The same mechanism is working in your mmal-demo.c, so it seems to be good there. Anything useful from "sudo vcdbg log msg"?
At a first sight I could not spot anything interesting in the logs, but I will check more carefully tomorrow morning.
There's one major difference between mmal-demo.c and the VLC implementation: In mmal-demo.c we just pass the buffer header coming from the decoder to the image_fx and the one one coming from image_fx to the video_renderer. In the VLC modules we allocate a pool of buffer_headers in each of our modules (decoder, filter, vout) and do only pass the payload (in case of opaque mode the 64 byte data field) between the modules. So when travelling from one source to target we rebind the payload to another header. Might this be causing woes in the refcounting?
@6by9 I walked through the previous discussions where we had this issue with the decoder before we worked around it by setting the extra buffers. Looking at it again I still don't understand why the extra buffers solve the problem there. The difference between the problematic code in VLC and the mmal-demo.c is that with mmal-demo.c we always pass something close to buffer_num as headers to the decoder as well as image_fx. So for buffer_num=40 the number of buffer headers available to the output ports never fall below 35. In VLC the number falls below this threshold as we need some buffers in the VLC core for scheduling, etc.
Dennis (@dennishamester) had made a minimal testcase which uses only the decoder to demonstrate the problem: https://gist.github.com/dennishamester/9852055 With that one you can easily play with buffer numbers. If the difference between allocated buffers and buffer headers available to the decoder output port is <= 5 everything works fine. As soon as the difference is > 5 it stalls after 7 decoded frames.
./dec-buf-num 20 14
Expect deadlocking
Creating 20 opaque buffers
Frame 1 decoded (decoder has 14 buffers)
Frame 2 decoded (decoder has 14 buffers)
Frame 3 decoded (decoder has 14 buffers)
Frame 4 decoded (decoder has 14 buffers)
Frame 5 decoded (decoder has 14 buffers)
Frame 6 decoded (decoder has 14 buffers)
Frame 7 decoded (decoder has 14 buffers)
There it stalls. The video_decodeRIL pool holds 7 buffers at that point which are all locked. If the number of headers passed to the decoder is set to 15 it does not lock, but the pool is just the same 7 pictures.
I feel like this is exactly the same as what happens with the image_fx and somehow I feel that setting a high amount of extra buffers is just a workaround of some underlying problem. Could you maybe have a short look at what happens there?
I will try to extend mmal-demo.c to demonstrate the issue with image_fx as well.
Ok, this was simple and I think it made me understand the issue way better. Here is an extended mmal-demo.c which demonstrates the stall of image_fx: https://gist.github.com/julianscheel/3e32bc64a784ab7f3a51
You need to call it with a second parameter which is the maximum number of buffer headers which will be sent to the image_fx output port.
./mmal-demo test.h264 35
With the limit of 35 everything runs smooth for the currently set buffer_num of 40. Now reduce it to 34:
./mmal-demo test.h264 34
You shall see the process freezing after the first few frames. Now the gap of 5 is static, so no matter how you set buffer_num it won't run further although the buffer was released already through mmal_buffer_header_release. What did not happen in that case though is that the buffer header has been sent to the output again. So if the refcount is reduced when receiving the header again it's obvious that it will fail.
So I think we will have to reduce our buffer_header pool sizes not to be bigger than the number of buffers we can reliably pass back to the modules immediately. I will think about this further.
Can you confirm if my understanding of the issue seems sane to you now?
I tried the idea with using a smaller buffer header pool, so that mmal_port_send_buffer is definitely called for all buffers immediately, but it still stalls when I have a delta > 5 between buffer_num and buffer headers being pushed to the output port at the time where it stalls. I'd be really grateful if you could take a look at this.
Oh and one more thing: MMAL_PARAM_IMAGEFX_DEINTERLACE_ADV does not actually seem to deinterlace anything at the moment. MMAL_PARAM_IMAGEFX_DEINTERLACE_DOUBLE does work, but looks expectedly terrible :)
I spent a whole lot of time today to try understanding what goes on but it makes no sense to me. To sum up:
My assumption why this is the way it is was that the decoder needs 15 pictures for internal reference, so it has to have access to 15 buffers. Is this assumption correct or is there another reason for the 15 required buffers?
buffer_num = 21 requires 16 buffer headers to be present at output port to avoid freezes
buffer_num = 22 requires 17
buffer num = 25 requires 20
and so on
Now this makes no sense to me when thinking of the previous assumption. Why should the number of required internal reference frames in the decoder change with growing number of total buffers?
Now I thought this might be due to the refcount not being lowered because the buffers which were used to get the image out of the module were not passed back in (as the queue probably works as a fifo?) and hence the refcounts are not lowered. So my idea was to keep some buffers completely out of the game and allocate a buffer header pool which is smaller than buffer_num, but this does not change the behaviour either. Or alternatively allocate the pool with the full size but keep some headers away from the output port at all, but again no change there. So my guess is that the set buffer_num is somehow part of the calculation of blocked frames or whatever makes it stall.
This analysis can be transferred 1:1 to the image_fx module where it makes even less sense to me. I understand that the advanced deinterlace filter requires 3 frames as reference for temporal interpolation, so I could understand if 3 or maybe 4 buffer headers would have to be in there all time, but I don't understand that I have to provide 15 buffer headers to the output all time to avoid freezes if buffer_num is 20.
Do you have any further thoughts, ideas, whatever? I don't see how to overcome this problem right now. Maybe a quick chat on this topic would be helpful?
As it seems that I can't get things to fit into the VLC concept as it is now I started to makeup a combined decoder/image_fx module in vlc, where I would setup a tunnel from decoder to image_fx in one vlc module.
While trying to get this up I recognized an issue with the new firmware: I can't seem to use image_fx with omx anymore. When I load our old omx modules which worked fine with previous firmware the creation of image_fx fails now:
OMX_AllocateBuffer failed (80001000: OMX_ErrorInsufficientResources)
This is when image_fx is the first element in an OMX tunnel and I try to set it up for I420 input. Any thoughts on this? Has the overall resource use of image_fx been increased?
I don't know if it's really related to the issues I have with setting up the mmal tunnel, but I thought it might be related.
Sorry, I was out of the office on Friday, away for the weekend, and I'm on a training course all this week.
There should have been no changes to the resources required by image_fx. The changes MMAL needed are all in the port definition setup, particularly if nStride is 0 it must compute the required stride, and the same for nBufferSize. The bug that MMAL is hitting still with I420 is that nSliceHeight is defaulting to 16 and it computes nBufferSize based on that. Unsurprisingly that then fails when it wants full frame buffers.
I will try and have a look at your sample code at some point tomorrow.
@6by9 Thanks, taking a look would be really appreciated. Actually I'm still trying out different things of handling the buffer headers to get things to run, but it is a bit complicated as I don't know how the internal lifecycle of the pictures is exactly working. I assume with some more insights it would be easily possible to handle things. Would be really great if you can take a look at it. If I find out anything further useful today I'll post it here.
One important thing I have not mentioned yet I think: It seems that image_fx is not working at all if buffer_num is != buffer_num_recommended (20). So far I need at least buffer_num = 25 in our VLC code. Although I'm trying to reduce it, but with 20 it seems that opaque pictures get used multiple times and hence are displayed out of order.
Forget about this one, it was my fault
@6by9 I was able to achieve some major progress. I reshuffled some of our buffer management code in VLC, so that we can run with buffer_num of 22 now and I can ensure that enough buffer headers are at each port all the time without running into misuse of opaque handles. While this all feels a little fragile it seems to work well.
The reason that the deinterlace filter was not deinterlacing yet was that I had to set parameter 0 to value 3 (I think this was for informing it about top field first)
So at a first sight things seem fine now. If you still feel like looking into my example regarding the required buffer_nums I'd still be interested in your findings as it could dramatically simplify our vlc code if it would work a little different.
And if you could fix I420 support this still would be really appreciated!
@popcornmix Can your merge the extensions from https://github.com/6by9/RPiTest/blob/master/mmal_image_fx/mmal_parameters_camera.h into the userland repository?
Done.
@popcornmix Thanks, I have one issue left at the moment: The deinterlace filter always increases the PTS for the interpolated frame by 30000. I try to override it explicitly to what I need like this:
MMAL_PARAMETER_IMAGEFX_PARAMETERS_T imfx_param = {
{ MMAL_PARAMETER_IMAGE_EFFECT_PARAMETERS, sizeof(imfx_param) },
MMAL_PARAM_IMAGEFX_DEINTERLACE_ADV,
2,
{ 3, 20000 }
};
status = mmal_port_parameter_set(sys->component->output[0], &imfx_param.hdr);
But that second parameter seems not to have any effect. But as it is constant on 30000 the interpolation we saw with the OMX based filter does not seem to happen either.
Could you have a quick look at this?
To be clear, when setting the second parameter (default frame interval) using omx components, it works, but when using mmal components it gets ignored?
@popcornmix Sorry, just forget about it. The parameter is frame duration and hence divided by 2 for field duration.... So to get a delta of 20000 I better set it to 40000 ;)
As all this is properly working now, let's close the bug. Thanks for the support!
As we cannot use the image_fx component with MMAL yet I am trying to get Opaque buffers from the MMAL decoder into a OMX image_fx element. Digging through rpi-userland code it seems that MMAL_ENCODING_OPAQUE is mapped to OMX_COLOR_FormatBRCMOpaque. So I tried to configure the image_fx input port with:
This is rejected with OMX_ErrorBadParameter though. Any hints what has to be set in the format to make it accept the opaque format? Or is this not actually implemented in the OMX core?