[BUG] H264Encoder: enable_sps_framerate unsupported on Raspberry Pi 5

I am using a Raspberry Pi 5 running Bookworm 64bit (Picamera2 v0.3.22-2) to stream a Raspberry Pi High Quality Camera encoded to H.264 over RTSP using MediaMTX.

Since the RPi 5 lacks hardware encoding, passing the enable_sps_framerate parameter to H264Encoder results in an error due to (what I assume) H264Encoder being an alias for LibavH264Encoder which does not support the parameter.

Trying to open the H.264 encoded stream on an RPi 3B+ with ffplay results in the following output:

ffplay version 5.1.6-0+deb12u1+rpt1 Copyright (c) 2003-2024 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr --extra-version=0+deb12u1+rpt1 --toolchain=hardened --incdir=/usr/include/aarch64-linux-gnu --enable-gpl --disable-stripping --disable-mmal --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sand --enable-sdl2 --disable-sndio --enable-libjxl --enable-neon --enable-v4l2-request --enable-libudev --enable-epoxy --libdir=/usr/lib/aarch64-linux-gnu --arch=arm64 --enable-pocketsphinx --enable-librsvg --enable-libdc1394 --enable-libdrm --enable-vout-drm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
[rtsp @ 0x7f78000c20] max delay reached. need to consume packet/0
[rtsp @ 0x7f78000c20] RTP: missed 36 packets
[h264 @ 0x7f78004eb0] error while decoding MB 46 19, bytestream -9
[h264 @ 0x7f78004eb0] concealing 2083 DC, 2083 AC, 2083 MV errors in I frame
Input #0, rtsp, from 'rtsp://1.1.1.1:8000/main':
  Metadata:
    title           : No Name
  Duration: N/A, start: 0.133144, bitrate: N/A
  Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1280x720, 30 fps, 29.97 tbr, 90k tbn
[rtsp @ 0x7f78000c20] max delay reached. need to consume packet/0
[rtsp @ 0x7f78000c20] RTP: missed 35 packets
[h264 @ 0x7f78033ad0] error while decoding MB 46 19, bytestream -9
[h264 @ 0x7f78033ad0] concealing 2083 DC, 2083 AC, 2083 MV errors in I frame

Opening the stream with VLC on a Windows 10 PC does work, however I noticed some inconsistencies regarding frame rate / duration of recorded video.

As a workaround I am also encoding the stream to MJPEG which does play on the RPi 3B+ but only if it is 1280x720 pixels. Needless to say this doesn't utilize the Pi's hardware decoder and is quite taxing on the CPU.

This is a downsized version of the script I'm using:

#!/usr/bin/python3

from picamera2 import Picamera2
from picamera2.encoders import H264Encoder, JpegEncoder, Quality
from picamera2.outputs import FfmpegOutput

picam2 = Picamera2()

config = picam2.create_video_configuration(
    main={'size': (1920, 1080), 'format': 'YUV420'},
    lores={'size': (1280, 720), 'format': 'YUV420'},
    sensor={'output_size': (2028, 1080), 'bit_depth': 12},
    controls={'FrameDurationLimits': (33333, 33333)})  # 30 FPS

picam2.align_configuration(config)
picam2.configure(config)

output_main = FfmpegOutput("-f rtsp -rtsp_transport udp -bsf:v dump_extra rtsp://localhost:8000/main", audio=False)

output_lores = FfmpegOutput("-f rtsp -rtsp_transport udp -bsf:v dump_extra rtsp://localhost:8000/lores", audio=False)

# Encoder settings
encoder_main = H264Encoder(
    qp=30,
    repeat=True,
    iperiod=30,
    #enable_sps_framerate=True,  # <- Unsupported on RPi 5
    profile='main',
    framerate=30)

encoder_lores = JpegEncoder()
encoder_lores = frame_skip_count = 2

try:
    picam2.start_recording(encoder_main, output_main)
    picam2.start_recording(encoder_lores, output_lores, quality=Quality.MEDIUM)

    while True:
        time.sleep(5)
except:
    picam2.stop_recording()

Would it be possible to get SPS working on RPi 5? Or is there any other way to encode the required information, using FFmpeg for instance?

Hi, thanks for the question. I'm afraid I don't really know the answer, but maybe there are some things to look into.

Firstly, how do you know that the lack of enable_sps_framerate is the problem? I did a brief search for this error message, but was left rather uncertain as to the root cause. I don't think I saw anything specifically relating this to the presence of a framerate in the SPS parameters - have you maybe found any links that suggest this?

I had a look in the libav AVCodecContext structure to see if it has any support for something named enable_sps_framerate. There wasn't, but there is something called just framerate, so if you were up for some experimentation, we could perhaps give this a try.

Check out the latest Picamera2. You can set your PYTHONPATH environment variable to point to the folder where you've put it, to make sure that you're using this version. Go to this line and add

        self._stream.codec_context.framerate = Fraction(30, 1)

(substitute your own framerate if it's not 30fps.)

I tried this and it did seem to change the SPS timing headers, so let's see if it helps with your problem.

After some further research I believe this is a twofold issue.

First the errors reported by ffplay seem to stem from dropped packets resulting in an incomplete bitstream, according to this Stack Overflow post: ffmpeg RTSP error while decoding MB. I will switch to TCP and see if that helps.

For the second part I recorded ten seconds of video directly from the camera and ran ffprobe -v trace -show_frames "G:\test.h264" on it:

[AVFormatContext @ 00000211de661900] Opening 'G:\test.h264' for reading
[file @ 00000211de661c80] Setting default whitelist 'file,crypto,data'
Probing h264 score:51 size:2048
Probing mp3 score:1 size:2048
[h264 @ 00000211de661900] Format h264 probed with size=2048 and score=51
[h264 @ 00000211de661900] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:1
[h264 @ 00000211de6cb840] Decoding VUI
[extract_extradata @ 00000211de6d2e00] nal_unit_type: 7(SPS), nal_ref_idc: 3
[extract_extradata @ 00000211de6d2e00] nal_unit_type: 8(PPS), nal_ref_idc: 3
[extract_extradata @ 00000211de6d2e00] nal_unit_type: 6(SEI), nal_ref_idc: 0
[extract_extradata @ 00000211de6d2e00] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 00000211de6cb840] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 00000211de6cb840] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 @ 00000211de6cb840] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 00000211de6cb840] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 00000211de6cb840] Decoding VUI
[h264 @ 00000211de6cb840] Format yuv420p chosen by get_format().
[h264 @ 00000211de6cb840] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 00000211de6cb840] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
    Last message repeated 5 times
[h264 @ 00000211de6cb840] Decoding VUI
[h264 @ 00000211de661900] All info found
[h264 @ 00000211de661900] stream 0: start_time: NOPTS duration: NOPTS
[h264 @ 00000211de661900] format: start_time: NOPTS duration: NOPTS (estimate from bit rate) bitrate=0 kb/s
[h264 @ 00000211de661900] After avformat_find_stream_info() pos: 750592 bytes read:753664 seeks:0 frames:50
Input #0, h264, from 'G:\test.h264':
  Duration: N/A, bitrate: N/A
  Stream #0:0, 50, 1/1200000: Video: h264 (Main), 1 reference frame, yuv420p(progressive, left), 1920x1080, 0/1, 25 fps, 60 tbr, 1200k tbn
[h264 @ 00000211de6d2e00] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 00000211de6d2e00] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 @ 00000211de6d2e00] Decoding VUI
Processing read interval id:0 start:N/A end:N/A
[h264 @ 00000211de6d2e00] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 00000211de6d2e00] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 @ 00000211de6d2e00] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 00000211de6d2e00] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 00000211de6d2e00] Decoding VUI
[h264 @ 00000211de6d2e00] Format yuv420p chosen by get_format().
[h264 @ 00000211de6d2e00] Reinit context to 1920x1088, pix_fmt: yuv420p
[h264 @ 00000211de6d2e00] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2

Firstly, how do you know that the lack of enable_sps_framerate is the problem?

Evidently it is not because the SPS NALs are included in the stream, so you're right about that one.

However note that the frame rate is being reported as 25 fps which is incorrect.

After spending some more time looking into libavcodec, line 91 in libav_h264_encoder.py struck me as odd:

self._stream.codec_context.time_base = Fraction(1, 1000000)

What is the reasoning behind the 1000000 denominator? According to ffmpeg-libav-tutorial) the time base seems to work as follows:

When we're designing a video player we need to play each frame at a given pace, otherwise it would be hard to pleasantly see the video either because it's playing so fast or so slow.

Therefore we need to introduce some logic to play each frame smoothly. For that matter, each frame has a presentation timestamp (PTS) which is an increasing number factored in a timebase that is a rational number (where the denominator is known as timescale) divisible by the frame rate (fps).

Which explains why time_base is 1/1200000 in the encoded stream (set by libavcodec I assume): You can't divide 1000000 by 30.

...

Okay, this is interesting. Judging from encoder.py line 263:

# The sensor timestamp is the most accurate one, so we'll fetch that.
ts = int(request.request.metadata[controls.SensorTimestamp] / 1000)  # ns to us

The timestamps for the encoder come directly from the sensor, which would explain the 1000000 time base. The problem is the whole timestamp thing depends on the frame rate. So a hardcoded 1000000 means we're limited to frame rates that satisfy x % 1000000 = 0 (e.g. 1, 2, 4, 8, 10, 16, 20, 25, 32). I don't know how the math for this works but I observed a 10:00 minute recording have a duration of 9:52 at 30.4 FPS.

self._stream.codec_context.framerate = Fraction(30, 1)

Yes, that should work for fixing the incorrect frame rate at least. Not sure what to do about the timestamps, though.

It's difficult to know exactly what to do. I believe that the framerate info in the SPS header is usually ignored because it's just the wrong way to convey that kind of information. The normal thing would be that the container mechanism for the stream would contain real timestamps. For example, you might think you're running at 30fps, but your camera probably isn't. It could easily be 30.01fps and you'd have no idea (until you've waited quite a long time). So what would you even put into the SPS? Using proper timestamps in the container gets round any difficulties by sampling the real time (well, system time, again it's not entirely guaranteed to be exactly right, but it's probably a lot closer than the camera rate), it just means your timestamps are rounded to the microsecond, which I would expect to be fine.

Perhaps the best thing would be to prevent any kind of framerate info in the SPS, though I'm not quite sure how you can get PyAV to do that.

I managed to fix the packet loss issue. The problem was that the RPi 5 didn't have enough buffer space causing packets to be dropped server-side. The section of the MediaMTX documentation on Corrupted frames explains this. Setting writeQueueSize: 1024 in the mediamtx.yml resolves the problem.

Stream #0:0, 50, 1/1200000: Video: h264 (Main), 1 reference frame, yuv420p(progressive, left), 1920x1080, 0/1, 25 fps, 60 tbr, 1200k tbn

The 1200000 time base in the encoded stream is for the container (tbn), not for the codec context (tbc), which is apparently deprecated. The frame rate does get correctly encoded by the 60 tbr (double the actual frame rate).

I messed around with the LibavH264Encoder a bit and found that the encoder completely ignores anything set for the stream.base_time and stream.codec_context.base_time; in fact you can simply omit the self._stream.codec_context.time_base = Fraction(1, 1000000) line and it still produces the same output (which makes sense considering that it is deprecated). FPS is set in the call to add_stream: self._stream = self._container.add_stream(self._codec, rate=self.framerate), so setting it explicitly is unnecessary.

Using proper timestamps in the container gets round any difficulties by sampling the real time (well, system time, again it's not entirely guaranteed to be exactly right, but it's probably a lot closer than the camera rate), it just means your timestamps are rounded to the microsecond, which I would expect to be fine.

Yeah, the timestamps themselves are not the problem, It's more so that they don't end up in the encoded file:

[FRAME]
media_type=video
stream_index=0
key_frame=1
pts=N/A
pts_time=N/A
pkt_dts=N/A
pkt_dts_time=N/A
best_effort_timestamp=N/A
best_effort_timestamp_time=N/A
duration=37500
duration_time=0.031250
pkt_pos=0
pkt_size=139787

I would expect the PTS to get encoded since it is set in line 120. Might be worth looking into.

Anyways, I'm gonna close this for now since my setup is working. Thanks for your help.

Thanks for the update, and glad things are working better. The Ffmpeg process re-timestamps all the packets it gets (a problem in itself that I'd like to do something about one day), so I guess any timing information in the H.264 stream gets ignored. The idea of timing information within the codec stream has always seemed wrong to me!

raspberrypi / picamera2

[BUG] H264Encoder: enable_sps_framerate unsupported on Raspberry Pi 5 #1135