scottlamb / retina

High-level RTSP multimedia streaming library, in Rust
https://crates.io/crates/retina
Apache License 2.0
244 stars 48 forks source link

no control url #9

Closed lattice0 closed 3 years ago

lattice0 commented 3 years ago

My VStarcam camera is failing here:

https://github.com/scottlamb/retina/blob/main/src/client/parse.rs#L424

as you can see, it has no control attribute, but it has control attributes on the media descriptions.

sdp: SessionDescription { version: 0, origin: Origin { username: "VSTC", session_id: 3836310016, session_version: 3836310016, network_type: "IN", address_type: "IP4", unicast_address: "192.168.1.198" }, session_name: "streamed by the VSTARCAM RTSP server", session_information: None, uri: None, email_address: Some("NONE"), phone_number: None, connection_information: Some(ConnectionInformation { network_type: "IN", address_type: "IP4", address: Some(Address { address: "0.0.0.0", ttl: None, range: None }) }), bandwidth: [], time_descriptions: [TimeDescription { timing: Timing { start_time: 0, stop_time: 0 }, repeat_times: [] }], time_zones: [], encryption_key: None, attributes: [], media_descriptions: [MediaDescription { media_name: MediaName { media: "video", port: RangedPort { value: 0, range: None }, protos: ["RTP", "AVP"], formats: ["96"] }, media_title: None, connection_information: None, bandwidth: [Bandwidth { experimental: false, bandwidth_type: "AS", bandwidth: 1536 }], encryption_key: None, attributes: [Attribute { key: "control", value: Some("track0") }, Attribute { key: "rtpmap", value: Some("96 H264/90000") }, Attribute { key: "fmtp", value: Some("96 packetization-mode=1;profile-level-id=4d002a;sprop-parameter-sets=Z00AKp2oHgCJ+WbgICAoAAADAAgAAAMAfCA=,aO48gA==") }] }, MediaDescription { media_name: MediaName { media: "audio", port: RangedPort { value: 0, range: None }, protos: ["RTP", "AVP"], formats: ["8"] }, media_title: None, connection_information: None, bandwidth: [Bandwidth { experimental: false, bandwidth_type: "AS", bandwidth: 64 }], encryption_key: None, attributes: [Attribute { key: "control", value: Some("track1") }, Attribute { key: "rtpmap", value: Some("8 PCMA/8000/1") }] }] }

Here's the raw response:

RTSP/1.0 200 OK
Cseq: 2
Date: Mon, Jul 26 2021 17:35:15 GMT
Content-Type: application/sdp
Content-Length: 403

v=0
o=VSTC 3836309504 3836309504 IN IP4 192.168.1.198
s=streamed by the VSTARCAM RTSP server
e=NONE
c=IN IP4 0.0.0.0
t=0 0
m=video 0 RTP/AVP 96
b=AS:1536
a=control:track0
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1;profile-level-id=4d002a;sprop-parameter-sets=Z00AKp2oHgCJ+WbgICAoAAADAAgAAAMAfCA=,aO48gA==
m=audio 0 RTP/AVP 8  
b=AS:64
a=control:track1
a=rtpmap:8 PCMA/8000/1

is the control really needed?

lattice0 commented 3 years ago

I changed

let control = control.ok_or_else(|| "no control url".to_string())?;

to

let control = control.unwrap_or(request_url.clone());

and it at least does the SETUP, but it looks like it sends the entire uri: uri="rtsp://192.168.1.198:10554/tcp/track0" and the camera does not like that and simply ignores.

my client (works):

SETUP rtsp://192.168.1.198:10554/tcp/av0_0/track0 RTSP/1.0
Authorization: Digest username="admin", algorithm="MD5", realm="RTSPD", nonce="8s63k7h3mn6aa8o1u7mix4599qku0ra6", uri="/tcp/av0_0", response="4f6333f7580f407195eb2436a5905704"
CSeq: 4
Transport: RTP/AVP/TCP;interleaved=0
User-Agent: RRTSP Client

yours (no response):

SETUP rtsp://192.168.1.198:10554/tcp/track0 RTSP/1.0
Authorization: Digest username="admin", realm="RTSPD", nonce="7b3q3ssplskhy59wu77345tewnrx5see", uri="rtsp://192.168.1.198:10554/tcp/track0", response="c2438d1f4c1e56003003fa0878d1d094"
CSeq: 3
Transport: RTP/AVP/TCP;unicast;interleaved=0-1
User-Agent: moonfire-rtsp test
x-Dynamic-Rate: 1

or maybe the problem is the missing av0_0 on yours

scottlamb commented 3 years ago

is the control really needed?

According to RFC 2326? I think so. From sections C.1 and C.2, it looks like there should be a session-level control if aggregate control is possible, eg a single PLAY command that starts all the streams (that have been set up). You quoted a presentation with a video and audio stream that I imagine are supposed to be started with a single PLAY. So I think the server is not compliant with RFC 2326.

But this isn't the first broken camera I've seen, and I prefer actually working with real cameras over strict RFC compliance, so I'd like to fix this.

How did your client get the av0_0 component? I don't see that in the DESCRIBE response; what was the original DESCRIBE URL?

lattice0 commented 3 years ago

This is a well sold camera on aliexpress (VStarcam) that worked with all the clients I tested, specifically with ZLMediakit in C++.

rtsp://192.168.1.198:10554/tcp/av0_0 is the original URL gathered from ONVIF calls.

ok, turns out that in https://github.com/scottlamb/retina/blob/b1db9a9e8b94ff7077050cc26be0a50cbf1bd58e/src/client/parse.rs#L289, we have:

        } else if a.key == "control" {
            control = a
                .value
                .as_deref()
                .map(|c| join_control(base_url, c))
                .transpose()?;
            alt_control = a
                .value
                .as_deref()
                .map(|c| join_control(alt_base_url, c))
                .transpose()?;
        }

base_url is rtsp://192.168.1.198:10554/tcp/av0_0 while alt_base_url is rtsp://192.168.1.198:10554/tcp/av0_0/. When join_control(______, c) is called on these URLs, in the first one, base_url, we get rtsp://192.168.1.198:10554/tcp/track0, which is wrong. In the second one, we get rtsp://192.168.1.198:10554/tcp/av0_0/track0, which is right and makes the camera server respond to the SETUP.

What you think should be done? Is calling join on a URL without / in the end simply replacing the last url 'node'?

I changed to


            control = a
                .value
                .as_deref()
                .map(|c| join_control(alt_base_url, c))
                .transpose()?;

that is, I ignored base_url and setted no alt_control, and it worked. At least up to the PLAY:

thread 'retina_client_cam1' panicked at 'calledResult::unwrap()on anErrvalue: RtspResponseError { conn_ctx: ConnectionContext { local_addr: 172.17.0.2:40216, peer_addr: 192.168.1.198:10554, established_wall: WallTime(Timespec { sec: 1627326108, nsec: 249319267 }), established: Instant { tv_sec: 331582, tv_nsec: 624545829 } }, msg_ctx: RtspMessageContext { pos: 782, received_wall: WallTime(Timespec { sec: 1627326109, nsec: 547855062 }), received: Instant { tv_sec: 331583, tv_nsec: 923080738 } }, method: Play, cseq: 4, status: Ok, description: "PLAY response has no RTP-Info header" }', /home/dev/orwell/liborwell/src/rtsp/retina_client.rs:141:40

but I'll investigate this after I finish fixing the url control error.

scottlamb commented 3 years ago

So my interpretation of joining the urls is to do it as in the URL/HTTP specifications. I'm on my phone so I can't quote them now. But other clients—at least ffmpeg—just append a / to the base and concatenate no matter what. They don't even look for a ? that starts a query string first. I think this is wrong but it seems consistent with what some cameras expect. My alt_control was an attempt to meet their expectations when interpreting the RTP-Info of the PLAY response. I'm not sure what we should do when deciding what URL to send for SETUP. Maybe we should just do the more popular thing even if seems wrong, or have a quirks/options/policy knob to let the caller decide (maybe based on a config file), or try one and then the other on error. /shruggie I'm open to ideas...

lattice0 commented 3 years ago

Do you mean that some cameras actually expect rtsp://192.168.1.198:10554/tcp/av0_0 to be rtsp://192.168.1.198:10554/tcp/track0 when a control attribute is chosen?

I think ffmpeg approach is ok, it's the best chance of success, because lots of cameras might actually get tested on it.

I don't remember what the RFC says about joining the URL, and I don't even remember reading about the control attribute. I just made it work on my client by reading other clients.

scottlamb commented 3 years ago

I'm at a computer now so I can actually look up the relevant bits.

In this case, I mean that I wrote it based on what RFC 2326 says, and I found that some cameras (and ffmpeg) differed from that. Some cameras' URLs are structured such that both interpretations produce the same result. Some will accept either method (Dahua example below). I'm not sure if there are actually cameras where ffmpeg's approach doesn't work properly.

Going back to the standards:

With a base URL of rtsp://192.168.1.198:10554/tcp/av0_0, the RFCs say the stream URL should be rtsp://192.168.1.198:10554/tcp/track0. But you tried that and it didn't work.

ffmpeg does this (in libavformat/rtsp.c):

                /* XXX: may need to add full url resolution */
                av_url_split(proto, sizeof(proto), NULL, 0, NULL, 0,
                             NULL, NULL, 0, p);
                if (proto[0] == '\0') {
                    /* relative control URL */
                    if (rtsp_st->control_url[strlen(rtsp_st->control_url)-1]!='/')
                    av_strlcat(rtsp_st->control_url, "/",
                               sizeof(rtsp_st->control_url));
                    av_strlcat(rtsp_st->control_url, p,
                               sizeof(rtsp_st->control_url));
                } else
                    av_strlcpy(rtsp_st->control_url, p,
                               sizeof(rtsp_st->control_url));

It's wrong but it works.

Dahua cameras are an interesting example. They set a base URL of eg rtsp://192.168.5.111:554/cam/realmonitor?channel=1&subtype=1&unicast=true&proto=Onvif/ with a control URL of eg trackID=1, so:

I imagine they went out of their way to accept either, although who knows. Their code might just look for the URL ending with trackID=1, so both work by accident.

One thing writing this library has taught me: many RTSP implementations are garbage. But we should be able to make most cameras work anyway.

lattice0 commented 3 years ago

I think there's no harm in trying both ways and picking the first that works. If we choose the same as ffmpeg then it might break for other cameras, I guess. However I don't know if this would be a good idea for all types of quirks, as it would generate a bunch of tries.

OR, we should see what live555 does. I guess it's the most used library? Maybe all these cameras were tested in it?

scottlamb commented 3 years ago

Good point about live555. Looks like this is the code in question:

  // Figure out what the URL describing "subsession" will look like.
  // The URL is returned in three parts: prefix; separator; suffix
  //##### NOTE: This code doesn't really do the right thing if "sessionURL()"
  // doesn't end with a "/", and "subsession.controlPath()" is relative.
  // The right thing would have been to truncate "sessionURL()" back to the
  // rightmost "/", and then add "subsession.controlPath()".
  // In practice, though, each "DESCRIBE" response typically contains
  // a "Content-Base:" header that consists of "sessionURL()" followed by
  // a "/", in which case this code ends up giving the correct result.
  // However, we should really fix this code to do the right thing, and
  // also check for and use the "Content-Base:" header appropriately. #####
  prefix = sessionURL(subsession.parentSession());
  if (prefix == NULL) prefix = "";

  suffix = subsession.controlPath();
  if (suffix == NULL) suffix = "";

  if (isAbsoluteURL(suffix)) {
    prefix = separator = "";
  } else {
    unsigned prefixLen = strlen(prefix);
    separator = (prefixLen == 0 || prefix[prefixLen-1] == '/' || suffix[0] == '/') ? "" : "/";
  }

It's effectively the same as ffmpeg, although the comment confirms my understanding that it's incorrect.

Maybe all the major client implementations have been wrong for so long that cameras expect that instead of the behavior specified in the RFC.

It feels wrong IMHO to continue perpetuating this so maybe we should try the correct URL first and then the other one if that fails.

lattice0 commented 3 years ago

Ok so I guess I'll make this work.

But first we have to device what to do on

let control = control.ok_or_else(|| "no control url".to_string())?;

Do you agree with

let control = control.unwrap_or(request_url.clone());

?

Then, for

        let url = stream
            .control
            .as_ref()
            .unwrap_or(&self.state.presentation.control)
            .clone();
        let mut req =
            rtsp_types::Request::builder(rtsp_types::Method::Setup, rtsp_types::Version::V1_0)
                .request_uri(url)
                .header(
                    rtsp_types::headers::TRANSPORT,
                    format!(
                        "RTP/AVP/TCP;unicast;interleaved={}-{}",
                        proposed_channel_id,
                        proposed_channel_id + 1
                    ),
                )
                .header(crate::X_DYNAMIC_RATE.clone(), "1".to_owned());
        if let Some(ref s) = self.state.session_id {
            req = req.header(rtsp_types::headers::SESSION, s.clone());
        }
        let (msg_ctx, cseq, response) = self.conn.send(&mut req.build(Bytes::new())).await?;

what about this:

        let url1 = stream
            .control
            .as_ref()
            .unwrap_or(&self.state.presentation.control)
            .clone();
        let url2 = stream
            .alt_control
            .as_ref()
            .unwrap_or(&self.state.presentation.control)
            .clone();
        let req =
            rtsp_types::Request::builder(rtsp_types::Method::Setup, rtsp_types::Version::V1_0)
                .header(
                    rtsp_types::headers::TRANSPORT,
                    format!(
                        "RTP/AVP/TCP;unicast;interleaved={}-{}",
                        proposed_channel_id,
                        proposed_channel_id + 1
                    ),
                )
                .header(crate::X_DYNAMIC_RATE.clone(), "1".to_owned());
        let mut req1 = req.clone().request_uri(url1);
        let mut req2 = req.clone().request_uri(url2);
        if let Some(ref s) = self.state.session_id {
            req1 = req1.header(rtsp_types::headers::SESSION, s.clone());
            req2 = req2.header(rtsp_types::headers::SESSION, s.clone());
        }
        let (msg_ctx, cseq, response) = {
            first_of(
            self.conn.send(&mut req1.build(Bytes::new())),
            self.conn.send(&mut req2.build(Bytes::new())),
            )
        };

where first_of is that tokio thing that resolves the first successful stream?

ps:

alt_control: Option<Url>,

to

pub alt_control: Option<Url>,

scottlamb commented 3 years ago

Do you agree with let control = control.unwrap_or(request_url.clone());?

Sounds good.

What about [...] where first_of is that tokio thing that resolves the first successful stream?

tokio::select!?

Maybe we should do it sequentially with the correct URL first and the commonly used one second? I can imagine a really confusing situation where there's some user-visible difference based on which wins the race. Also, I've learned recently that tokio::select! can be surprisingly tricky to get right in a couple ways. First, it's a borrow checker struggle when both futures need the same underlying value (self.conn). Second, if you potentially drop one future when the other completes, you get a strange behavior, as mentioned in this blog post.

lattice0 commented 3 years ago

The problem is that my camera never responds when the url is not ended in /. So we'd have to wait some seconds just to be sure that the device did not respond. And even then it could be just a network delay. So checking for the first URL to just then check the second is not very nice.

Indeed tokio::select! would be tricky and now thinking, it could be that the url not ending with / times out before the url ending in / returns, thus selecting the one not ending with /.

What do you suggest?

scottlamb commented 3 years ago

The problem is that my camera never responds when the url is not ended in /.

Oh, that's horrible.

Argh, maybe we should just give in then and do what the other implementations do; it might be an easier road. If we encounter a camera that expects otherwise, then we can introduce the option to allow the caller to select how we construct the control URL.

lattice0 commented 3 years ago

Ok so what about this:

} else if a.key == "control" {
            control = a
                .value
                .as_deref()
                .map(|c| join_control(alt_base_url, c))
                .transpose()?;
            alt_control = a
                .value
                .as_deref()
                .map(|c| join_control(alt_base_url, c))
                .transpose()?;
        }

on https://github.com/scottlamb/retina/blob/b1db9a9e8b94ff7077050cc26be0a50cbf1bd58e/src/client/parse.rs#L289?

and the

let control = control.unwrap_or(request_url.clone());

modification?

it's working here up to that h264 parsing error which I'll investigate soon

scottlamb commented 3 years ago

Effectively, yeah, always using joining with what's now the alt_base_url sounds good. We don't need to keep both control and alt_control if they're the same, and likewise don't need to keep both base_url and alt_base_url.

And yeah, the unwrap_or sounds good also.

lattice0 commented 3 years ago

@scottlamb I left alt_control because you said it's used on other things that I didn't look yet.

Looks like it's used here only:


        if presentation.streams.len() == 1 {
            // The server is allowed to not specify a stream control URL for
            // single-stream presentations. Additionally, some buggy
            // cameras (eg the GW Security GW4089IP) use an incorrect URL.
            // When there is a single stream in the presentation, there's no
            // ambiguity. Be "forgiving", just as RFC 2326 section 14.3 asks
            // servers to be forgiving of clients with single-stream
            // containers.
            // https://datatracker.ietf.org/doc/html/rfc2326#section-14.3
            stream = Some(&mut presentation.streams[0]);
        } else {
            stream = presentation
                .streams
                .iter_mut()
                .find(|s| matches!(&s.control, Some(u) if u == &url));

            // If we didn't find a stream, try again with alt_control. Don't do
            // this on the first pass because we should check all of the
            // proper control URLs first.
            if stream.is_none() {
                stream = presentation
                    .streams
                    .iter_mut()
                    .find(|s| matches!(&s.alt_control, Some(u) if u == &url));
            }
        }

https://github.com/scottlamb/retina/blob/b1db9a9e8b94ff7077050cc26be0a50cbf1bd58e/src/client/parse.rs#L518

should I just remove if stream.is_none() {...}?

scottlamb commented 3 years ago

Yeah, the is_none branch accomplishes nothing if control and alt_control are always the same, so removing it SGTM.

scottlamb commented 3 years ago

I stared at this a bunch and so far have found at least four different ways of joining a relative control URL with its base:

  1. the RFC way, as mentioned above. Unfortunately from your comment above, the VStarcam servers ignore (don't respond to at all) SETUP requests for this URL, so it absolutely won't work for them. GW Security cameras also send a RTP-Info that's inconsistent with this interpretation, although that's less severe. (We can make do without RTP-Info at all.)
  2. the ffmpeg/live555 way. Basically, strcat(base_url, base_url.ends_with("/") ? "" : "/", control_url). This also isn't consistent with GW Security's RTP-Info but seems to work otherwise.
  3. the gstreamer way. Use the base URL with its path replaced by concat(base_path.rstrip("/"), "/", control.lstrip("/"), so any query string is retained as-is. Also not consistent with GW Security's RTP-Info but seems to work otherwise.
  4. the GW Security way (apparently, based on their RTP-Info url). treat the base URL as if it has a trailing slash, even if it doesn't. Then join, discarding query string. (Unsure what they would do if the control URL had a leading slash; I'm guessing their algorithm from a single example.)

Your PR is consistent with option 4. I think this works for all cameras I've seen but am still looking them over. This is such a confused jumble. I wouldn't be surprised if eventually we'll have to make our behavior customizable when we encounter another camera that doesn't match option 4.

I think it's also time to start writing up a document about interoperability. I'll work on that.

lattice0 commented 3 years ago

If we want to support all cameras possible, then sticking with what live555 does should be the best option. Also, tinycam monitor should have been tested on 1000s of models already, so it would be a nice idea to look at what it does. However it might have quirks for different models and we wouldn't know since it's closed source. All I know is that it has a way to specify a camera model when adding a new one, and it has lots of models.

scottlamb commented 3 years ago

Is live555 really that overwhelmingly the most accepted implementation? This isn't its only divergence from the RFC. For example, when matching RTP-Info sections to streams, live555's client code doesn't use the url parameter at all (as RFC 2326 section 12.33 says they should), instead doing it positionally within RTSPClient::handlePLAYResponse. I agree it makes sense to follow common implementations where the RFC is ambiguous, but in both of these live555 is contradicting it, so I'm confused and torn. :-(

Boolean RTSPClient::handlePLAYResponse(MediaSession* session, MediaSubsession* subsession,
                                       char const* scaleParamsStr, char const* speedParamsStr,
                                       char const* rangeParamsStr, char const* rtpInfoParamsStr) {
// ...
      while ((subsession = iter.next()) != NULL) {
        u_int16_t seqNum; u_int32_t timestamp;
        subsession->rtpInfo.infoIsNew = False;
        if (parseRTPInfoParams(rtpInfoParamsStr, seqNum, timestamp)) {
          subsession->rtpInfo.seqNum = seqNum;
          subsession->rtpInfo.timestamp = timestamp;
          subsession->rtpInfo.infoIsNew = True;
        }

        if (subsession->rtpSource() != NULL) subsession->rtpSource()->enableRTCPReports() = True; // start sending RTCP "RR"s now
      }
lattice0 commented 3 years ago

I remember seeing live555 popping out on lots of camera related stuff that I did. What about https://github.com/ZLMediaKit/ZLMediaKit? It worked right away for my cameras.

I also think most NVRs use live555 because why not? It's the only really old open source implementation and they like old things that are released as zip files

scottlamb commented 3 years ago

I guess when it comes down to it, I'm fine with:

and live555, tinyCam, and ZLMediaKit are good implementations to compare against, thanks. We can look into those as it comes up and add relevant things to an interop doc (#18). (I hadn't heard of ZLMediaKit but I see it at least has a bunch of stars on github.)