scottlamb / retina

High-level RTSP multimedia streaming library, in Rust
https://crates.io/crates/retina
Apache License 2.0
244 stars 48 forks source link

packet follows marked packet with same timestamp #13

Closed lattice0 closed 3 years ago

lattice0 commented 3 years ago

cargo run --example client mp4 --url rtsp://192.168.1.198:10554/tcp/av0_0 --username admin --password 123456 --initial-timestamp ignore /home/dev/orwell/video_samples/tmp/cam_test.mp4

gives

E20210727 19:24:02.308 main client] Fatal: [172.17.0.2:50402(me)->192.168.1.198:10554@2021-07-27T19:23:55, 119614@2021-07-27T19:24:02, channel=0, stream=0, ssrc=0000173a] packet follows marked packet with same timestamp

from here https://github.com/scottlamb/retina/blob/b1db9a9e8b94ff7077050cc26be0a50cbf1bd58e/src/codec/h264.rs#L181

I don't know why --initial-timestamp ignore is needed but without it I get

https://github.com/scottlamb/retina/blob/b1db9a9e8b94ff7077050cc26be0a50cbf1bd58e/src/client/mod.rs#L705

Anyways, the packet follows marked packet with same timestamp error occurs after around 30 calls to push, like this:

push
push
push
push
push
push
push
push
push
push
push
push
push
push
push
push
8023 (mod-2^32: 8023), npt 0.000: 111304-byte video frame
push
E20210727 19:36:04.662 main client] Fatal: [172.17.0.2:50572(me)->192.168.1.198:10554@2021-07-27T19:36:01, 113756@2021-07-27T19:36:04, channel=0, stream=0, ssrc=00001f57] packet follows marked packet with same timestamp

as you see, a video frame is generated and then we get the error. It always happens after the first video frame is emitted.

The strange thing is that on my app, the retina client does not produce this error. I'm trying to figure out what is different.

[section moved to #15]

Do you have any idea on the error on the mp4 example, and on my client?

scottlamb commented 3 years ago

I don't know why --initial-timestamp ignore is needed

That's a consequence of the missing RTP-Info header (#10), which contains this timestamp. permissive should also work. require will fail. default will work with a single stream but not more.

packet follows marked packet with same timestamp

Hmm. My understanding from RFC 6184 is that each "access unit" (picture, more or less video frame) should have a different timestamp, and a packet with a MARK indicates that an access unit is complete. (The encoder is allowed to never set a mark, instead just letting the decoder figure that out from the timestamps, but setting MARK means that the decoder knows a frame ended without waiting for the next one to start.) Thus my expectation that the next packet has a different timestamp.

You could try commenting out the if statement that returns this error and see if it helps.

Is it possible the camera uses interlaced video? Then I think there might actually be two pictures per video frame.

Could you post the debug output of a VideoParameters from this stream? That might help me figure out what's going on. Actually, the best thing would be a full packet capture if you're willing to share that.

[section moved to #15]

lattice0 commented 3 years ago

[moved to #15]

scottlamb commented 3 years ago

I'm moving the stuff about ffmpeg errors to #15 because I was getting confused mixing them. Leaving this issue as just the packet follows marked packet with same timestamp error.

scottlamb commented 3 years ago

The strange thing is that on my app, the retina client does not produce this error. I'm trying to figure out what is different.

Is it possible that it's that the mp4 example sets up both the video and audio stream, while your app only sets up the audio stream? Including the audio stream shouldn't cause the video stream to become corrupt, but who knows. You could try passing the --no-audio parameter to the mp4 example and see if that makes any difference.

scottlamb commented 3 years ago

Is it possible the camera uses interlaced video? Then I think there might actually be two pictures per video frame.

From the SPS, no. frame_mbs_only_flag is 1, so each picture is a frame (rather than a "field").

lattice0 commented 3 years ago

It has the same problem if I pass --no-audio. In fact I tried this that same day.

From the SPS, no. frame_mbs_only_flag is 1, so each picture is a frame (rather than a "field").

it would be good to consider this case for other cameras, thanks

scottlamb commented 3 years ago

Hmm, I don't know what's different then, unless your app is just getting RTP packets (via RtspSession's Stream implementation) rather than H.264 frames (via RtspSession::demuxed). You may need to just iteratively make the programs more similar until they get the same behavior and you see what matters, or maybe we can figure it out with a complete packet capture (rather than the ones you got earlier which apparently were missing a lot of IP packets).

From the SPS, no. frame_mbs_only_flag is 1, so each picture is a frame (rather than a "field").

it would be good to consider this case for other cameras, thanks

Yeah, there are certainly various cases on my TODO list to examine, but as far as I can tell none are relevant to your situation.

scottlamb commented 3 years ago

Did you get any further in understanding this?

I'm also seeing this error on a Reolink RLC-822A camera running firmware v3.0.0.177_21012107 v3.0.0.177_21012101. It's just not handling the timestamp and MARK field correctly at all. According to H.264 section 7.4.1.2.3, every access unit must have a VCL NAL, and if there's an SPS and PPS present, they "cannot follow the last VCL NAL unit of the primary coded picture". (In this camera's case, there's one VCL NAL per picture, so the qualifiers "last" and "primary picture" can be ignored.) And RFC 6184 section 5.1 says that the timestamp must match that of the primary coded picture of the access unit and that the marker bit can only be set on the final packet of the access unit. But the Reolink camera sends at the beginning of the stream:

  1. the SPS with the marker bit set (thus putting it into an invalid no-VCL access unit)
  2. the PPS (same timestamp, which causes this error)
  3. the VCL NAL (SliceLayerWithoutPartitioningIdr) with a new timestamp (so also putting the PPS into an invalid no-VCL access unit)

Then for every subsequent IDR frame, the Reolink camera sends:

  1. the SPS with the same timestamp as the previous VCL NAL (causing this error again)
  2. the PPS, same timestamp
  3. the VCL NAL (SliceLayerWithoutPartitioningIdr) with a new timestamp (putting the SPS+PPS into another invalid no-VCL access unit)

Apparently Reolink uses an ancient version of live555. live555's current server code seems to set the marker bit incorrectly (search for thisNALUnitEndsAccessUnit in liveMedia/H264or5VideoStreamFramer.cpp, which gets set any time the next NAL unit is a SPS or PPS), which probably explains some of the behavior I see.

In a quick look at why other RTSP clients don't seem to care about this:

I'm deciding how to handle this. It'd be helpful to know more about when this happens on VStarCam cameras.

lattice0 commented 3 years ago

so @scottlamb I'll test more soon on VStarcam, I just need to close https://github.com/scottlamb/retina/pull/12 because I'd like to test and modify things to solve this problem but would be nice to have those patches sent so I can focus on this, otherwise I get confused about different branches and etc. Could you send the .txt for those 2 cases so we both work on the same branch?

I had some problems these days and I've been doing some things so I have less free time to fix this but I'll certainly investigate this as it's what's missing so I can get images from my camera.

I also need to read more about VCL, PPS, SPS

scottlamb commented 3 years ago

Ack, I'll look at #12 again tonight.

scottlamb commented 3 years ago

...tomorrow morning, sorry. These cameras aren't behaving and I'm too sleepy to make sense of it.