raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.2k stars 5.02k forks source link

h264_v4l2m2m encoder muxed packet discrepancy #4734

Open dustinkerstein opened 2 years ago

dustinkerstein commented 2 years ago

When encoding with the hardware accelerated v4l2 h264 encoders, the resulting file is playable, but there is a discrepancy between the encoded frames and packets. Starting with a YUV file created by:

libcamera-vid --width 4096 --height 3040 -t 1000 --codec yuv420 -o ~/out.yuv

And then encoding it with:

ffmpeg -y -f rawvideo -pix_fmt yuv420p -s:v 4096x3040 -r 10 -i ~/out.yuv -filter:v scale=1920:1088 -c:v h264_v4l2m2m -b:v 25M -v verbose ~/out.mp4

Results in a discrepancy between the encoded/muxed frames:

Input file #0 (/home/pi/out.yuv):
  Input stream #0:0 (video): 9 packets read (168099840 bytes); 9 frames decoded;
  Total: 9 packets (168099840 bytes) demuxed
Output file #0 (/home/pi/out.mp4):
  Output stream #0:0 (video): 9 frames encoded; 10 packets muxed (29386 bytes);
  Total: 10 packets (29386 bytes) muxed

While I'm able to play this video in VLC, when I attempt to extract the frames with ffmpeg -y -i out.mp4 -v verbose -pix_fmt rgb24 camera_%04d.png, something seems a bit wrong:

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x558bf63fe0] st: 0 edit list: 1 Missing key frame while searching for timestamp: 0
[h264 @ 0x558bf65180] no frame!
[h264 @ 0x558bf65180] Reinit context to 1920x1088, pix_fmt: yuv420p
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'camera.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:00:00.90, start: 0.000000, bitrate: 269 kb/s
    Stream #0:0(und): Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(left), 1920x1080 (1920x1088), 261 kb/s, 11.11 fps, 10 tbr, 10240 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> png (native))
Press [q] to stop, [?] for help
[h264 @ 0x558bf6c180] no frame!
[h264 @ 0x558bf9e2a0] Reinit context to 1920x1088, pix_fmt: yuv420p
Error while decoding stream #0:0: Invalid data found when processing input
[graph 0 input from stream 0:0 @ 0x558bf6c6a0] w:1920 h:1080 pixfmt:yuv420p tb:1/10240 fr:10/1 sar:0/1
[auto_scaler_0 @ 0x558c34a4c0] w:iw h:ih flags:'bicubic' interl:0
[format @ 0x558c348fb0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_null_0' and the filter 'format'
[swscaler @ 0x558c34b260] No accelerated colorspace conversion found from yuv420p to rgb24.
[auto_scaler_0 @ 0x558c34a4c0] w:1920 h:1080 fmt:yuv420p sar:0/1 -> w:1920 h:1080 fmt:rgb24 sar:0/1 flags:0x4
Output #0, image2, to 'camera_%04d.png':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
    Stream #0:0(und): Video: png, 1 reference frame, rgb24(left), 1920x1080, q=2-31, 200 kb/s, 10 fps, 10 tbn, 10 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc58.91.100 png
[AVIOContext @ 0x558c370090] Statistics: 0 seeks, 1 writeouts
[AVIOContext @ 0x558bfcb670] Statistics: 0 seeks, 1 writeouts
    Last message repeated 3 times
No more output streams to write to, finishing.
[AVIOContext @ 0x558bfcb670] Statistics: 0 seeks, 1 writeouts
    Last message repeated 3 times
frame=    9 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.90 bitrate=N/A speed=2.41x
video:399kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Input file #0 (camera.mp4):
  Input stream #0:0 (video): 10 packets read (29386 bytes); 9 frames decoded;
  Total: 10 packets (29386 bytes) demuxed
Output file #0 (camera_%04d.png):
  Output stream #0:0 (video): 9 frames encoded; 9 packets muxed (409083 bytes);
  Total: 9 packets (409083 bytes) muxed
[AVIOContext @ 0x558bf19d50] Statistics: 30276 bytes read, 0 seeks

When I encode the out.yuv file with libx264, I do not see this discrepancy in the encoded/muxed frames and I am able to then extract the frames without error from the encoded file.

I am testing on a Compute Module 4 + Official IO Board + Bullseye Lite 64bit + 5.10.63-v8+ Kernel. Full system info here - https://pastebin.com/654vBepL

Let me know if you need any further info or help replicating.

dustinkerstein commented 2 years ago

Quick update - This may be specific to the ffmpeg h264_v4l2m2m encoder. I don't believe I can replicate this with Gstreamer's v4l2h264enc. I will double check and update the title accordingly.

6by9 commented 2 years ago

Sorry, meant to respond earlier. I was suspecting that it may be an FFmpeg issue.

The standard V4L2 spec requires the H264 header bytes to be sent with the first frame. There is V4L2_CID_MPEG_VIDEO_HEADER_MODE which accepts V4L2_MPEG_VIDEO_HEADER_MODE_SEPARATE or V4L2_MPEG_VIDEO_HEADER_MODE_JOINED_WITH_1ST_FRAME. https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/ext-ctrls-codec.html. It's currently not implemented, but could be relatively easily as it should map to MMAL_PARAMETER_VIDEO_ENCODE_HEADERS_WITH_FRAME (currently set on the encoder from bcm2835_codec_create_component

FFmpeg does try to set that (see https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/v4l2_m2m_enc.c#L196), but AFAIK there is no obligation for the encoder to implement it.

dustinkerstein commented 2 years ago

Ok, interesting. I haven't tested much to see if there are any real ramifications beyond the warnings/errors in FFmpeg. Do you think it's relatively benign, or could it cause issues?