Closed Curid closed 6 months ago
the camera is a Reolink Video Doorbell WiFi
This camera sends an "out-of-band" sequence parameter set (in the DESCRIBE response's SDP): Z2QAM6wVFKCgPZA=
in base64, 67640033ac1514a0a03d90
in hex. h264_reader
says this is valid. This SPS has no VUI (which is fine).
Then the camera sends another SPS "in-band" (as part of the RTP stream): 67640033ac1514a0a03da1000004f6000063380404
. It's invalid according to h264_reader
. RbspReaderError(RemainingData)
means there was unexpected extra stuff after h264_reader
parsed everything it expected to find. If I remove the last byte, it does parse. The difference between the two is as follows:
--- out-of-band sps
+++ in-band sps without last byte
@@ -1,38 +1,57 @@
SeqParameterSet {
profile_idc: ProfileIdc(
100,
),
constraint_flags: ConstraintFlags {
flag0: false,
flag1: false,
flag2: false,
flag3: false,
flag4: false,
flag5: false,
reserved_zero_two_bits: 0,
},
level_idc: 51,
seq_parameter_set_id: SeqParamSetId(
0,
),
chroma_info: ChromaInfo {
chroma_format: YUV420,
separate_colour_plane_flag: false,
bit_depth_luma_minus8: 0,
bit_depth_chroma_minus8: 0,
qpprime_y_zero_transform_bypass_flag: false,
scaling_matrix: SeqScalingMatrix,
},
log2_max_frame_num_minus4: 9,
pic_order_cnt: TypeZero {
log2_max_pic_order_cnt_lsb_minus4: 9,
},
max_num_ref_frames: 1,
gaps_in_frame_num_value_allowed_flag: true,
pic_width_in_mbs_minus1: 39,
pic_height_in_map_units_minus1: 29,
frame_mbs_flags: Frames,
direct_8x8_inference_flag: true,
frame_cropping: None,
- vui_parameters: None,
+ vui_parameters: Some(
+ VuiParameters {
+ aspect_ratio_info: None,
+ overscan_appropriate: Unspecified,
+ video_signal_type: None,
+ chroma_loc_info: None,
+ timing_info: Some(
+ TimingInfo {
+ num_units_in_tick: 1270,
+ time_scale: 25400,
+ fixed_frame_rate_flag: false,
+ },
+ ),
+ nal_hrd_parameters: None,
+ vcl_hrd_parameters: None,
+ low_delay_hrd_flag: None,
+ pic_struct_present_flag: false,
+ bitstream_restrictions: None,
+ },
+ ),
}
so essentially the new SPS additionally declares a fixed frame rate of 10 fps (a tick every 1,270 / 25,400 seconds, 2 ticks per frame because H.264 supports interlaced video). That sounds perfectly plausible and, except for the extra byte, valid.
Next question: which is wrong: h264_parser
or the camera? I'm skimming the H.264 spec (see here to find a copy). Section 7.3.2.1 defines the SPS data syntax. Section E.1.1 defines the VUI data syntax. To me the h264_parser
code looks correct / consistent with that. Putting that together with the data minus the trailing byte looking plausible, I think the camera is actually just sending a strange extra byte at the end for some infuriating reason.
Next question: what should we do about it? I'm unsure:
h264-parser
not error if there's extra trailing data? This risks masking bugs. These bitstreams aren't exactly JSON or protobuf. They don't include the names or tag numbers for each field they're sending. If the sender and the receiver don't have identical expectations for what fields are being sent, in what order, and how many bits are used for each, the receiver will end up with gibberish. So allowing a little surprise extra stuff at the end is not terribly appealing. I guess it doesn't have to be all-or-nothing though. It's even possible to specify at the Retina level even by passing in a wrapping BitRead
here that overrides the provided finish_rbsp
with something more tolerant. It could log error and continue. We could choose to do this based on context and/or policy knob.Thoughts?
I'm also wondering what other clients do. Did VLC log any errors/warnings when talking to this camera?
ffmpeg uses some "interesting" macros...but I think is actually checking that the rbsp trailing bits are as expected, just as h264_parser
/ retina
do. The fixed
macro appears to be passing in the expected bit as the min and max of a range to xu
which then I guess bails if the actual found valid doesn't match.
#define fixed(width, name, value) do { \
av_unused uint32_t fixed_value = value; \
xu(width, name, fixed_value, value, value, 0, ); \
} while (0)
// ...
#define xu(width, name, var, range_min, range_max, subs, ...) do { \
uint32_t value; \
CHECK(ff_cbs_read_unsigned(ctx, rw, width, #name, \
SUBSCRIPTS(subs, __VA_ARGS__), \
&value, range_min, range_max)); \
var = value; \
} while (0)
// ...
static int FUNC(rbsp_trailing_bits)(CodedBitstreamContext *ctx, RWContext *rw)
{
int err;
fixed(1, rbsp_stop_one_bit, 1);
while (byte_alignment(rw) != 0)
fixed(1, rbsp_alignment_zero_bit, 0);
return 0;
}
// ...
static int FUNC(sps)(CodedBitstreamContext *ctx, RWContext *rw,
H264RawSPS *current)
{
int err, i;
// ...
flag(vui_parameters_present_flag);
if (current->vui_parameters_present_flag)
CHECK(FUNC(vui_parameters)(ctx, rw, ¤t->vui, current));
else
CHECK(FUNC(vui_parameters_default)(ctx, rw, ¤t->vui, current));
CHECK(FUNC(rbsp_trailing_bits)(ctx, rw));
return 0;
}
Another point of comparison: deepch/vdk just stops when it has all the data it knows to look for, here. This program...
package main
import (
"encoding/hex"
"fmt"
"github.com/deepch/vdk/codec/h264parser"
"log"
)
func main() {
raw, err := hex.DecodeString("67640033ac1514a0a03da1000004f6000063380404")
if err != nil {
log.Fatalf("hex string wouldn't parse: %v", err)
}
sps, err := h264parser.ParseSPS(raw)
if err != nil {
log.Fatalf("sps wouldn't parse: %v", err)
}
fmt.Printf("sps: %+v\n", sps)
}
prints: sps: {Id:0 ProfileIdc:100 LevelIdc:51 ConstraintSetFlag:0 MbWidth:40 MbHeight:30 CropLeft:0 CropRight:0 CropTop:0 CropBottom:0 Width:640 Height:480 FPS:10}
I think the camera is actually just sending a strange extra byte at the end for some infuriating reason.
Is there some reference H264 implementation we could try it with? Why just 0x04
, does it repeat the last byte? It'd be interesting to dump the firmware and find out.
Next question: what should we do about it? I'm unsure:
Accept that this camera doesn't work with Retina? Then we wouldn't be upholding a promise in the README: "Works around brokenness in cheap closed-source cameras."
This may not be the only camera with this issue though, and it's harder to find replacements for door bell cameras compared to normal ones.
Make h264-parser not error if there's extra trailing data? This risks masking bugs. These bitstreams aren't exactly JSON or protobuf. They don't include the names or tag numbers for each field they're sending. If the sender and the receiver don't have identical expectations for what fields are being sent, in what order, and how many bits are used for each, the receiver will end up with gibberish. So allowing a little surprise extra stuff at the end is not terribly appealing. I guess it doesn't have to be all-or-nothing though. It's even possible to specify at the Retina level even by passing in a wrapping BitRead here that overrides the provided finish_rbsp with something more tolerant. It could log error and continue. We could choose to do this based on context and/or policy knob.
We could add a check for the in-band SPSs to see if the extra byte has length 1 and equals 0x04
to reduce the risk of masking bugs.
When Retina receives an invalid SPS after having previously received a valid one, keep using the previously valid one? This is a bit similar to option 2. It could also mask problems that later produce gibberish, as iirc the SPS/PPS have stuff in them like how many bits are used to represent parts of the actual video data, or "quantization parameters" that...I dunno, I'm really not a codec expert, somehow also ultimately affect how the data stream turns into video. But again, we could log error and make this optional.
If it randomly receives an invalid SPS several hours after the stream started I'd prefer if it threw an error.
bluenviron ignores extra bytes.
https://github.com/bluenviron/mediacommon/blob/main/pkg/codecs/h264/sps.go#L702
fmt.Printf("pos: %v size: %v\n", pos, len(buf)*8) // pos: 125 size: 136
package main
import (
"encoding/hex"
"fmt"
"log"
"github.com/bluenviron/mediacommon/pkg/codecs/h264"
)
func main() {
raw, err := hex.DecodeString("67640033ac1514a0a03da1000004f6000063380404")
if err != nil {
log.Fatalf("hex string wouldn't parse: %v", err)
}
var sps h264.SPS
err = sps.Unmarshal(raw)
if err != nil {
log.Fatalf("sps wouldn't parse: %v", err)
}
fmt.Printf("sps: %+v\n", sps)
fmt.Printf("vui: %+v\n", *sps.VUI)
}
prints
sps: {
ProfileIdc:100
ConstraintSet0Flag:false
ConstraintSet1Flag:false
ConstraintSet2Flag:false
ConstraintSet3Flag:false
ConstraintSet4Flag:false
ConstraintSet5Flag:false
LevelIdc:51
ID:0
ChromaFormatIdc:1
SeparateColourPlaneFlag:false
BitDepthLumaMinus8:0
BitDepthChromaMinus8:0
QpprimeYZeroTransformBypassFlag:false
ScalingList4x4:[]
UseDefaultScalingMatrix4x4Flag:[]
ScalingList8x8:[]
UseDefaultScalingMatrix8x8Flag:[]
Log2MaxFrameNumMinus4:9
PicOrderCntType:0
Log2MaxPicOrderCntLsbMinus4:9
DeltaPicOrderAlwaysZeroFlag:false
OffsetForNonRefPic:0
OffsetForTopToBottomField:0
OffsetForRefFrames:[]
MaxNumRefFrames:1
GapsInFrameNumValueAllowedFlag:true
PicWidthInMbsMinus1:39
PicHeightInMapUnitsMinus1:29
FrameMbsOnlyFlag:true
MbAdaptiveFrameFieldFlag:false
Direct8x8InferenceFlag:true
FrameCropping:<nil>
VUI:0xc00009a040
}
vui: {
AspectRatioInfoPresentFlag:false
AspectRatioIdc:0
SarWidth:0
SarHeight:0
OverscanInfoPresentFlag:false
OverscanAppropriateFlag:false
VideoSignalTypePresentFlag:false
VideoFormat:0 VideoFullRangeFlag:false
ColourDescriptionPresentFlag:false
ColourPrimaries:0
TransferCharacteristics:0
MatrixCoefficients:0
ChromaLocInfoPresentFlag:false
ChromaSampleLocTypeTopField:0
ChromaSampleLocTypeBottomField:0
TimingInfo: {NumUnitsInTick:1270 TimeScale:25400 FixedFrameRateFlag:false}
NalHRD:<nil>
VclHRD:<nil>
LowDelayHrdFlag:false
PicStructPresentFlag:false
BitstreamRestriction:<nil>
}
Is there some reference H264 implementation we could try it with?
Turns out yes. https://www.itu.int/rec/T-REC-H.264.2-201602-I/en has a zip file with a bunch of stuff in it. Some of it seems to be specifically tailored for stuff like Scalable Video Codec. The basic thing seems to be the H.264.2_JM_19_0
directory; the readme says the latest version can be found at http://iphome.hhi.de/suehring/tml. The code is...garbage quality. But you can run it as follows.
$ python3 -c 'open("sps-and-pps", "wb").write(bytes.fromhex("0000000167640033ac1514a0a03da1000004f60000633804040000000168ee3cb0"))'
$ CFLAGS=-g make -j
$ lldb bin/ldecod.exe -- -p InputFile=sps-and-pps
(lldb) run
It will crash trying to interpret video frames that aren't in the file, but it seems to parse the bogus SPS first. It doesn't care about the trailing RBSP bits being correct; it just stops reading a given NAL when it finds what it wants.
I guess that's three implementations so far that just stop reading. Should we do the same? 🤷
Why just 0x04, does it repeat the last byte? It'd be interesting to dump the firmware and find out.
🤷 That level of reverse engineering is more than I can do in the time I have.
If it randomly receives an invalid SPS several hours after the stream started I'd prefer if it threw an error.
It certainly would be possible to define a policy where it only allows the invalid SPS say before the first frame of video.
I guess that's three implementations so far that just stop reading. Should we do the same? 🤷
You're right that it may mask other bugs, so it might be better to add a special exception for just this camera.
@dholroyd what do you think?
I can see that there is an example of other syntax in the spec where it's expected to have additional data between seq_parameter_set_data()
and rbsp_trailing_bits()
subset_seq_parameter_set_rbsp( ) {
seq_parameter_set_data( )
...load of stuff...
rbsp_trailing_bits( )
}
But I can't see this being needed in a normal seq_parameter_set_rbsp()
, and I can't see any flags that would allow seq_parameter_set_data()
to be extended in a later version of the spec (and there's a separate NAL Unit type specifically for SPS extensions).
So I agree that the example you describe seems wrong.
How would it be if h264_reader::nal::sps::SpsError
was altered to contain those parts of the SPS that were apparently parsed prior to a syntax error? Something like that would let the calling code make its own choices about how much to trust the data, and could also provide useful diagnostic context when trying to work out what's wrong with a bitstream.
This design still falls short on the diagnostic front because the data structures can't represent having parsed 'half' of a SeqParameterSet
- the best that could be done is to e.g. return SeqParameterSet
with vui_parameters: None
if we hit an issue in vui_parameters( )
syntax.
Anyway, in this case, the error would contain the complete SeqParameterSet
of course.
This design still falls short on the diagnostic front because the data structures can't represent having parsed 'half' of a
SeqParameterSet
- the best that could be done is to e.g. returnSeqParameterSet
withvui_parameters: None
if we hit an issue invui_parameters( )
syntax.
How would this work if the error is before some mandatory element?
I'm leaning toward a more modest thing of just adding a Retina flag to ignore the trailing data part only. I want Retina to work with a variety of cameras, but as a general practice I prefer to understand each distinct issue to the extent I can, rather than doing something broader that might paper over some unrelated bug.
I got another report in which the camera sent the SPS 67640033ac1514a02800f190
both in the DESCRIBE
and in-band. Decoding produces RbspReaderError(RemainingData)
unless I chop off the last two bytes. That lets us answer a couple questions:
When Retina receives an invalid SPS after having previously received a valid one, keep using the previously valid one?
Wouldn't work; in this case it's never received a valid SPS.
Why just 0x04, does it repeat the last byte?
In this case it's two bytes [edit: or three, extra 0x00 bytes after the last bit set are unnecessary but explicitly allowed] or three and they don't match the previous two, so apparently not.
Any updates? Anything else I could help with?
Another report for the same error with a reolink cx410
Bad SPS: RbspReaderError(RemainingData)
conn: 172.22.0.2:57512(me)->192.168.254.104:554@2024-05-21T14:58:16
stream: TCP, interleaved channel ids 0-1
ssrc: dd247e4e seq: 13956
pkt: 294297@2024-05-21T14:58:18
Any updates? Anything else I could help with?
This should be fixed on Retina's tolerate-bad-sps
branch. Give it a spin? If it works, I'll prep Retina and Moonfire releases.
cx410 and the doorbell camera confirmed working!
reolink_working_vlc_pcap.zip