jc-kynesim / rpi-ffmpeg

FFmpeg work for RPI
Other
111 stars 25 forks source link

Maximizing extra_hw_frames allocation #58

Closed vrazzer closed 1 year ago

vrazzer commented 1 year ago

Using hevc_v4l2request to hardware decode 4K HEVC 8-bit/10-bit video. Works well, but only if I hand-tune avctx->extra_hw_frames to match my CMA memory allocation. Due to the complexity of the internal "dst_slots" calculation combined with a hard-coded 32-frame overall limit, is unclear how a media player can correctly select extra_hw_frames. Too few and the player frame queue suffers-- too large and the decoder runs out of memory.

Is there some way to say, "allocate as many extra_hw_frames as you possibly can and tell me how many that is?"

Is the hard-coded 32-frame limit in v4l2_req_media.c/mediabufs_dst_slots_create legit or arbitrary? There is enough CMA memory to exceed that limit so unclear if it is artificial or not.

Is there a method to pass the destination buffers into hevc_v4l2request() rather than it allocate them? That would be my preferred approach, but neither code-review or experiments (trying to pre-allocate a pool of AVDRMFrameDescriptor) got me anywhere.

The HEVC hardware decoder performs nicely so great work. Just need to figure out how to make it efficiently adapt to different CMA allocations. Thanks.

vrazzer commented 1 year ago

For anyone else dealing with this issue, ended up using the following workaround:

1- Use my own allocator to allocate a contiguous frame buffer <= 32 frames (how ever many it can get). 2- Set the decoder thread count to a static value (went with 4). 3- Assume the hardware decoder could use the maximum number of reference frames (8 for h.265). 4- avctx->extra_hwframes = fbufs available (from step 1) - 4 (threads from step 2) - 8 (reference frames).

Has been solid with all the media I have tried. Really wish more than 20 extra frames were available (especially for 1080p60). Hopefully a future revision will remove the 32 frame limit allowing players a bit more buffering.