moq-wg / moq-transport

draft-ietf-moq-transport
Other
79 stars 17 forks source link

Globally unique broadcast URI #70

Closed kixelated closed 1 year ago

kixelated commented 1 year ago

The current broadcast URI has no restrictions except for being unique within a WebTransport session. There's a desire for a globally unique broadcast URI. See #69

From my understanding, the issue is that a relay might have to rewrite the broadcast URI in the OBJECT message if two incoming WebTransport sessions use the same URI. This could be a performance concern at scale, so it would be nice to pre-authenticate every broadcast URI.

@suhasHere @wilaw @fluffy is there additional rationale that I'm missing?

kixelated commented 1 year ago

My concern is that enforcing a globally unique URL is basically impossible without a central authority. @wilaw suggested public key crypto in our previous discussion but I don't think it prevents abuse.

Additionally, I think the performance benefit is overblown. The cost of serializing a new OBJECT header is negligible and would only need to be done if there's a collision. A relay already has to rewrite connection IDs, WebTransport sessions, STREAM IDs, QPACK indexes, and of course, re-encrypt.

For a service like Twitch, where relays are exposed to the internet, we would rewrite the broadcast URI once when it enters our video system. This also gives us the ability to add internal routing information to the URI in a way that is transparent to the end-user. The broadcast URI we serve to viewers would almost certainly be different than the one we ingested.

wilaw commented 1 year ago

Reays must correctly route incoming traffic from multiple vendors. It is not practical to enforce the constraint that each Broadcast must be contained within its own WebTransport connection. A relay network may be aggregating hundreds of thousands of individual broadcasts across many providers. This would require every pair of relays to establish a matching number of WebTransport connections, which is impractical and also costly from a TLS session establishment POV. We therefore need an architecture which decouples the identity of an object from the Webtransport connection which delivers it.

Additionally, I think the performance benefit is overblown. The cost of serializing a new OBJECT header is negligible and would only need to be done if there's a collision.

The performance cost is not in re-writing headers, which I agree is no more expensive than the other operations you cite. It is in doing lookups in to a dynamically changing and distributed table to determine if there is a collision.

is there additional rationale that I'm missing?

We should note that this broadcast ID does not have to be a character-sequence URI. It could also be a number. Its global uniqueness can be satisfied with either type. Considerations such as binary-encoding size may determine which form of identify we choose.

Additionally, if a network considers an incoming ID header verbose, it can always on its own volition create a dictionary that ties that ID to a more concise alias that is unique within its network and then rewrite the header to use that alias. For example if we used a character URI such as "twitch.com/live/games/2342342342/bob", we could replace it with "1234". This would result in fewer bits across the wire. Any time a relay delivers content to a subscriber outside its network (either the end-user or the entry point of a different network) it would need to restore the original ID header. This has the disadvantage that we don't get the compactness benefit over the last mile to the end-user. If we can design a means for a relay can distinguish an end-user from another relay, then we could avoid this problem.

kixelated commented 1 year ago

Additionally, if a network considers an incoming ID header verbose, it can always on its own volition create a dictionary that ties that ID to a more concise alias that is unique within its network and then rewrite the header to use that alias. For example if we used a character URI such as "twitch.com/live/games/2342342342/bob", we could replace it with "1234". This would result in fewer bits across the wire. Any time a relay delivers content to a subscriber outside its network (either the end-user or the entry point of a different network) it would need to restore the original ID header. This has the disadvantage that we don't get the compactness benefit over the last mile to the end-user. If we can design a means for a relay can distinguish an end-user from another relay, then we could avoid this problem.

Oh absolutely, I think rewriting the broadcast URI at the edge of the network is both a good idea and necessary. I think this boils down to a MUST versus SHOULD.

"The broadcast ID MUST be globally unique" means your suggestion of using broadcast ID "1234" does not adhere to the specification. An internal relay could ignore this requirement, but I don't think that is acceptable.

"The broadcast ID SHOULD be globally unique" adds some wiggle room but opens the possibility for collisions. A client will try to make IDs unique but can't guarantee it, so a relay will either need to reject duplicates or rewrite the URI.

"The broadcast ID SHOULD be random" could be a nice way to reduce the possibility of collisions without any central authority. The client initial QUIC connection ID works like this (at least 8 random bytes).

"The broadcast ID MUST be unique within the session" is the weakest requirement for the protocol to function. A client might just hardcode ID "0" and the relay always will need to rewrite URIs. This is basically how QUIC connection IDs work; an endpoint can use whatever they want provided it's not a duplicate.

suhasHere commented 1 year ago

From #69 , there is Warp Media Session and per representation Warp Streams . Its the uniqueness requirements from Media Session identifier that encompasses rest of the sub components under it. IIUC Warp Media Sessions , when represented as character URI, typically have components that form the origin component (like twitch.com or webex.com that are globally unique) and what happens inside that is controlled by the owning domain.

With that in mind a Warp Stream Id is output of some function that takes in (Warp Media Session Id and Representation Id) , which makes each such representation globally unique.

VMatrix1900 commented 1 year ago

Reays must correctly route incoming traffic from multiple vendors. It is not practical to enforce the constraint that each Broadcast must be contained within its own WebTransport connection. A relay network may be aggregating hundreds of thousands of individual broadcasts across many providers. This would require every pair of relays to establish a matching number of WebTransport connections, which is impractical and also costly from a TLS session establishment POV. We therefore need an architecture which decouples the identity of an object from the Webtransport connection which delivers it.

Agree. IIUC, the WARP Media Session from #69 is the end-to-end concept. The job of the relay is to forward object or stream/track hop by hop. The endpoint is responsible to group the object or stream/track into the Media Session. We should not enforce the mapping between the WARP Media Session and the Webtransport connection between each relay hop.

kixelated commented 1 year ago

From #69 , there is Warp Media Session and per representation Warp Streams . Its the uniqueness requirements from Media Session identifier that encompasses rest of the sub components under it. IIUC Warp Media Sessions , when represented as character URI, typically have components that form the origin component (like twitch.com or webex.com that are globally unique) and what happens inside that is controlled by the owning domain.

We need to support broadcasters who do not own a globally unique domain name. Twitch/Meta broadcasters must be able to publish media from clients like OBS via a URL, and most certainly cannot be trusted.

Maybe we could have them go through a one-time setup to certificate (ex. kixelated.users.quic.video) but they can still be malicious, perhaps by sharing the certificate or reusing broadcast IDs. Really the only way to make something globally unique is to have a trusted authority (ex. quic.video) issue and sign broadcast IDs, scoped to a specific time and host. That just seems excessive for what amounts to a near-zero performance improvement.

A relay SHOULD NOT assume that a broadcast URI is globally unique. However a relay MAY negotiate a broadcast URI scheme with publishers to avoid collisions within a session.

wilaw commented 1 year ago

We need to support broadcasters who do not own a globally unique domain name. Twitch/Meta broadcasters must be able to publish media from clients like OBS via a URL, and most certainly cannot be trusted.

Broadcasters don't need to own the domain name. They just need permission to broadcast under its identity. Think what happens if I want to publish rtmp in to Twitch today. I validate myself to your portal by logging in with username/password/2FA, get a secret key from your portal and then I set up OBS to publish to

rtmp://live.twitch.tv/app/

and provide the key for authentication. Cannot we use something similar for WARP? I could get my the same key and use OBS to set up a WebTransport connection to

https://live.twitch.tv/moq

kixelated commented 1 year ago

That means every MoQ ingest endpoint will need to run an intermediate CA. And clients will need to perform an extra round trip or two in order to fetch a certificate.

It could be a one-time setup, ex. OBS fetches a MoQ scoped certificate for kixelated.twitch.tv. But that still doesn't stop them from producing identical broadcast URIs, possibly even on accident from different computers. Revoking certificates when old ones are lost would be a massive pain and still won't prevent abuse.

All this mess for maaaaybe a tiny performance increase. Honestly I need to see the benchmark results. Lookups and serializing frames are some of the fastest operations in my experience and don't even show up on profiles.

IMO I think the middle-ground solution is to give the subscriber the ability to choose the broadcast URI/ID. This is how QUIC connection IDs work, and it allows the receiver to transparently dictate the schema.

ex. OBS could connect to live.twitch.tv and receive instructions (in the SETUP message?) to use broadcast URI moq://dfw02.twitch.tv/kixelated/6734.

The tricky part is avoiding an extra round trip, which is why QUIC lets the client choose the initial connection ID. The client will switch to the server's chosen connection ID for the rest of the handshake and going forward.

On Sun, Jan 29, 2023, 6:25 PM Will Law @.***> wrote:

We need to support broadcasters who do not own a globally unique domain name. Twitch/Meta broadcasters must be able to publish media from clients like OBS via a URL, and most certainly cannot be trusted.

Broadcasters don't need to own the domain name. They just need permission to broadcast under its identity. Think what happens if I want to publish rtmp in to Twitch today. I validate myself to your portal by logging in with username/password/2FA, get a secret key from your portal and then I set up OBS to publish to

rtmp://live.twitch.tv/app/

and provide the key for authentication. Cannot we use something similar for WARP? I could get my the same key and use OBS to set up a WebTransport connection to

https://live.twitch.tv/moq

— Reply to this email directly, view it on GitHub https://github.com/kixelated/warp-draft/issues/70#issuecomment-1407819611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADJVVUYVYPWZWMER4I7ANTWU4C7VANCNFSM6AAAAAAUIXXSZQ . You are receiving this because you authored the thread.Message ID: @.***>

vasilvv commented 1 year ago

As a general principle, I believe we should not define any semantic concepts that exceed the scope of current MoQ connection. HTTP generally follows this principle (a browser talking to an HTTP server generally does not care whether it's talking to a random Python web server or an nginx reverse proxy or a CDN). There are many reasons why this is a good design (better protocol agility, easier to reason about, etc), but ultimately, I believe that even if we design some "public global identifier", a lot of deployments will disregard it in favor of "backend rewrites everything to make things appear the same to the end user".

From my understanding, the issue is that a relay might have to rewrite the broadcast URI in the OBJECT message if two incoming WebTransport sessions use the same URI. This could be a performance concern at scale, so it would be nice to pre-authenticate every broadcast URI.

I think we should use a connection-scoped numeric ID for broadcasts, and do any URL matching, authentication checks, etc, at subscription time.

The performance cost is not in re-writing headers, which I agree is no more expensive than the other operations you cite. It is in doing lookups in to a dynamically changing and distributed table to determine if there is a collision.

I don't understand this problem. Almost every HTTP proxy that caches things identifies the objects by the URL that it used to fetch it from the backend. Why can't a MoQ relay identify objects in a similar fashion?

SpencerDawkins commented 1 year ago

I'm thinking that this is related to Issue 77 in the Requirements draft, and at least a high-level description of what's possible might belong in the Requirements section of that draft. No need to figure that out now, but I did want to call attention to it.

grmocg commented 1 year ago

Hoping that the minimum requirement here is to preclude or make highly improbable any corruption/poisoning.

suhasHere commented 1 year ago

69 will be updated once this issue is resolved

wilaw commented 1 year ago

I don't understand this problem. Almost every HTTP proxy that caches things identifies the objects by the URL that it used to fetch it from the backend.

HTTP caching works because the proxy can construct a cache key based (at a minimum) upon the concatenation of the objects' HOST and PATH . The HOST is managed by a global registry (DNS) and each HOST domain is responsible for assuring that the PATH maps consistently to the appropriate binary object. These two features avoid conflicts at the proxy. This cache key is independent of the HTTP connection used to request the content, allowing flexibility in how distribution systems use connections to move content.

Why can't a MoQ relay identify objects in a similar fashion?

Exactly. I think that is what this issue is asking for. Can we introduce the notion of globally registered identifier to the objects we are publishing and subscribing with MoQ just as we have with the objects we are getting and putting with HTTP?

kixelated commented 1 year ago

I don't understand this problem. Almost every HTTP proxy that caches things identifies the objects by the URL that it used to fetch it from the backend.

HTTP caching works because the proxy can construct a cache key based (at a minimum) upon the concatenation of the objects' HOST and PATH . The HOST is managed by a global registry (DNS) and each HOST domain is responsible for assuring that the PATH maps consistently to the appropriate binary object. These two features avoid conflicts at the proxy. This cache key is independent of the HTTP connection used to request the content, allowing flexibility in how distribution systems use connections to move content.

When you say HOST, do you mean the hostname in the URL? If so, that's how HTTP proxies work, but it's certainly not how HTTP caches work.

When using a HTTP proxy, the client sends a request for http://HOST/PATH to a configured address (ex. HTTP_PROXY env). The IP address is not the result of a DNS lookup, but rather an external configuration. The proxy uses the Origin header to determine the upstream HOST (which may not be the origin) and proxies the request.

When using a HTTP cache, the client sends a request for http://HOST/PATH to HOST based on a DNS lookup. The server has no information about the origin and needs some business logic. It's usually as simple as forward any paths starting with abc to upstream xyz. The HTTP cache constructs a new URL to perform the fetch, something like http://UPSTREAM/PATH. This is repeated until the origin is the upstream.

The hostname part of the URL must be rewritten when using a HTTP cache. After all, the HOST is the address of the next hop, not the origin.

It sounds like you want the ORIGIN and the PATH, which is not quite the same thing as a URL. Each hop would use some business logic to determine the UPSTREAM. It would connect to UPSTREAM but SUBSCRIBE with ORIGIN/PATH.

kixelated commented 1 year ago

I think we're getting too deep in semantics and need some examples. @wilaw you mentioned earlier that a broadcaster streaming to Twitch would be assigned an ORIGIN and would use public key crypto to verify authenticity, so let's try that out.

As a one-time setup, I would generate a private key for kixelated.users.twitch.tv and ask auth.twitch.tv to sign my public key. Going forward I would PUBLISH broadcasts starting with kixelated.users.twitch.tv. We would need to standardize this exchange in order to support generic clients.

When I publish a broadcast, I start by connecting to the closest ingest edge using geo DNS or anycast, which could change even during a broadcast via the GOAWAY message. I send a CATALOG kixelated.users.twitch.tv/123 and sign it using my private key, preventing relays from modifying the CATALOG. I also need to sign each OBJECT to prevent modification.

If the relay wants to modify the broadcast (ex. transcoder), it needs to make a brand new broadcast URL with itself as the origin (ex. transcode123.sfo01.twitch.tv). Otherwise, any requests for a new track ID or new object ID could somehow make their way to the origin (me) and I would have no idea what to do. This means a new private/public key.

When a viewer wants to watch my broadcast, they connect to the closest distribution edge using geo DNS or anycast, which again could change. They issue a SUBSCRIBE kixelated.users.twitch.tv/123 and simultaneously fetch my public key(s) from auth.twitch.tv via the same standardized auth service. If there's a relay that modifies the broadcast, the viewer will need to be told to use that URL/key instead.

The relay receives the request for my broadcast. It would need to use a database to figure out where kixelated.users.twitch.tv/123 is currently ingested. DNS is not an option because kixelated.users.twitch.tv does not actually exist, could switch suddenly (DNS caching), and could have multiple broadcasts ingested at different locations. The relay figures out the upstream based on the origin and this repeats for each hop.


My problem with this scheme is that I would ALWAYS rewrite the broadcast URL on ingest. My relay knows how to route to transcode123.sfo01.twitch.tv, but has no way of routing to kixelated.users.twitch.tv. I must avoid each relay performing a lookup for each broadcast because of the performance and reliability implications.

By terminating and rewriting the broadcast, I lose any benefit gained by using public key crypto. The viewer can't authenticate that kixelated produced the transcoded broadcast any longer, although I suppose we could create a signature chain...

Finally, I actually don't want to expose the inner-workings of my video system. I don't want viewers to subscribe to transcode123.sfo01.twitch.tv because it leaks information. I want them to subscribe to live.twitch.tv/<token> and I can decipher the encrypted token payload to get the actual origin/broadcast ID. I have to rewrite each OBJECT ID at our edge but that's literally what we do today with HTTP URLs.

So yeah, I don't think there's any reason to fake an origin for each user. It introduces complexity and doesn't give us much in return. Just let the broadcast push whatever broadcast ID they want to my ingest server and I'll rewrite if needed. Akamai could go through the trouble of generating and enforcing unique broadcast IDs in their system as a layer on top of Warp.

fluffy commented 1 year ago

Like Luke, I would hate to be required to have a domain for every user. But there are also many cases where for firewall management rules reasons you want to have a domain per customer. For example, twitches domain for webex is twitch.webex.com and akamai has akamai.webex.com - what edge they both route to depends on a ton of things including geolocation, private peering, and how much you pay webex, security policy, and other junk. I think we do want to be able to support things where there is a domain per customer.

afrind commented 1 year ago

Am I right in thinking this is also related to Announce (#150)? Can we merge or close this issue now?

kixelated commented 1 year ago

The current draft contains text requiring global unique track names.

Applications building on top of MoQ MUST ensure that the mechanism used guarantees global uniqueness

I absolutely disagree and want to remove this text. An application should be allowed to use any track name. I don't want track=video or track=localhost/video to be against the spec, especially when there's no mechanism to actually ensure global uniqueness (ex. DNS, certificates, etc).

suhasHere commented 1 year ago

We discussed this topic in length and at IETF116 and the current proposal works for all the use-cases. I would be against removing it though

afrind commented 1 year ago

track=video or track=localhost/video

These look like names without namespaces, eg: not full track names, so they don't need to be globally unique. You can't have two clients show up at the same relay though and both announce these as full track names, because the relay cannot know where to route subscriptions. Each would need a unique namespace in order to use the same track names.

Clarifying question:

Is the "global uniqueness" required of the application minting track names and namespaces limited to within that application, or across all moq applications everywhere? Some additional text around the scope and also the rationale may be useful.

kixelated commented 1 year ago

We discussed this topic in length and at IETF116 and the current proposal works for all the use-cases. I would be against removing it though

I'm sorry, I must have missed it during the PR because you know I've objected loudly every time this language has shown up in a PR.

Despite so many discussions, I still don't understand:

I certainly think we should leave the door open for track naming schemes that can guarantee uniqueness. I would love to see an extension that uses TLS certs to both name tracks and sign objects (required!). Another silly but actually "globally unique" example would be to use the wallet ID (ex. fingerprint of a self-signed cert) as the track name to announce blockchain updates over MoQ.

But this should be optional unless there's a compelling reason to require it. I want arbitrary track names.

kixelated commented 1 year ago

@suhasHere can you elaborate on why this is a requirement? I think the goal is that track name == cache key, but I can think so many use-cases where this does not work or is a giant security hole. I can list examples if this is the intent.

wilaw commented 1 year ago

I think the goal is that track name == cache key, but I can think so many use-cases where this does not work or is a giant security hole

@kixelated - can you elaborate on the cases where globally unique trackname cannot serve as a cache key? And also the security holes that this approach brings? Clearly a client-initiated unauthenticated trackname leaves a network open to cache poisoning (because I can push content that claims to be someone else's content) , however this issue is a) no different than if local track names are used and b) Addressed via authentication and access control, which I think most agree is a baseline requirement for moq-transport to operate in a multi-tenant and multi-network model?

afrind commented 1 year ago

cases where globally unique trackname cannot serve as a cache key

This isn't exactly the same thing, but will moq need concepts like no-cache or vary? In HTTP there's dynamically generated content where the same URL gives different bits for different users.

suhasHere commented 1 year ago

This isn't exactly the same thing

If we use global track names as cache key or a hash of it or some transformation of it , it should be possible .. I also don't understand why it can't be cache key and security issues with it

Full Track Names are what subscribers ask and it should work within and across distribution systems and also when a CDN support multiple applications. We had a long discussion on this during IETF116 (https://datatracker.ietf.org/meeting/116/materials/slides-116-moq-base-scenarios-for-moq-01) and also in 2 or 3 authors calls and agreed to have the design that is documented today.

kixelated commented 1 year ago

If we use global track names as cache key or a hash of it or some transformation of it , it should be possible .. I also don't understand why it can't be cache key and security issues with it

Okay, so.

First off, this is far too restrictive. It's equivalent to saying the path in a HTTP request MUST be globally unique. There has to be a clearly defined reason why this restriction is in the draft. It's a lot of work to both produce and enforce uniqueness within any distributed system, let alone a "global" one.

From what I can gather, the underlying reason is the ability to gossip multiple origins for a track. Here are some examples I've heard:


1. Akamai and Cloudflare both announce they can serve the whitehouse.gov/union broadcast.

This needs to be fleshed out before you can really even argue for or against it. But I'm going to try anyway by listing a few options.

Let's suppose we do want something like BGP, where CDNs gossip origins between themselves. Allowing any CDN to announce any track is extremely dangerous, as a compromised CDN could hijack any broadcast by announcing itself as a new origin. At the very least you'll need to pre-configure CDNs with a very narrow allowlist of possible origins for each track prefix.

But really, why do you want dynamic origins? A Webex participant doesn't know if it should publish to Akamai or Cloudflare. I don't think the client should choose since it doesn't know any better. It should be told where to publish by some Webex API that factors in the cost, peering, location of other participants, etc. That Webex API is then responsible for providing the corresponding subscribe URL to other participants, rather than relying on them to figure it out based on some gossip protocol. This is objectively safer and simpler than gossip with an allowlist.

The most secure would be something like torrent magnet links, where any entity can announce themselves as an origin if they can prove it. ANNOUNCE and OBJECT messages would be signed using a certificate, as proof that this content originated from whitehouse.gov and has not been tampered with. I really like this design as it decentralizes CDNs, however you gotta ask yourself: How is this media over QUIC? Why are we designing a torrent-like CDN architecture? Why would a CDN support this dramatic departure from their traditional architecture?


2. CNN and Newsmax publish the same whitehouse.gov/union broadcast, sharing a cache.

Ted brought up this example but with a sporting event. The benefits of situationally sharing a CDN cache are minor at best, and practically I don't even see why these two publishers would ever produce the same content since both are going to want their own editing and overlays.

The security problem is that there's no authority on the contents of the whitehouse.gov broadcast, so either publisher could insert their own content and poison the cache. This approach would only be acceptable if both publishers mutually trust each other, but in that case then why do they both need to publish? The CDN would just need to be configured with a rewrite, so Newsmax could reuse the CNN broadcast or vice-versa while still billing the right entities.

Again the proper solution again relies on crypto. Everything needs to be signed by whitehouse.gov to prevent tampering.


3. WebEx participants screen share the same `youtube.com/shorts/AWOyEIuVzzQ clip, sharing a cache.

Similar to the above, although all Webex users are authorized to publish Youtube clips. The same deal, a user could poison the cache and publish the wrong media, like a rick-roll or something more sinister. The same solution, youtube needs to sign everything. Although I'm not even sure if this is legal, and there's some billing ramifications.

A better solution is to stitch in tracks. Conceptually, this is server-side ad insertion. I don't actually think users should re-encode or re-publish Youtube clips, but rather insert a pointer into the catalog that says "use this external track instead". That (VOD) track could even be served over an entirely separate CDN, for example over Youtube's CDN, while the user's feed is served over Akamai.

suhasHere commented 1 year ago

Please refer to https://datatracker.ietf.org/meeting/116/materials/slides-116-moq-base-scenarios-for-moq-01 for scenarios/requirements that will help support unified use-cases.

Allowing any CDN to announce any track is extremely dangerous, as a compromised CDN could hijack any broadcast by announcing itself as a new origin

The authorization will fail and the hijacked CDN cannot vouch itself as the origin. Also, if CDN-A want to announce to CDN-b, there is business relationships setup, appropriate authz schemes would be worked out. Its not gossip.

A Webex participant doesn't know if it should publish to Akamai or Cloudflare. I don't think the client should choose since it doesn't know any better. It should be told where to publish by some Webex API that factors in the cost, peering, location of other participants, etc. That Webex API is then responsible for providing the corresponding subscribe URL to other participants, rather than relying on them to figure it out based on some gossip protocol. This is objectively safer and simpler than gossip with an allowlist.

None in the current text suggest that participants need to know the CDN. Only thing the current text says is Announce/Subscribes happens on tracks and their names are not tied to a CDN/Connection but instead to the application. However, Tracks have connection url which identifies the next hop network node.

The security problem is that there's no authority on the contents of the whitehouse.gov broadcast, so either publisher could insert their own content and poison the cache.

This is incorrect reading. If the orginal broadcast is expected to be modified at the raw media level by CNN and Newsmax, then they own the content ownership each and this is done by having business relationship between CNN, Newsmax and Whitehouse. This is similar to Media Transformer Entities (https://datatracker.ietf.org/doc/html/draft-nandakumar-moq-arch-00#section-2.2) .. Then the content will need to be identified with new track names ( whitehose-cnn track, whitehouse-newsmax track) in the cache.

If the original broadcaster doesn't want its content to be modified, then they have to end to end encrypt OR an agreement between the parties where it can happen.

Realtime conferences have Media Switches ( not Media Transformers), where the media is just switched at these servers. Then the track names need to carry over

Again the proper solution again relies on crypto. Everything needs to be signed by whitehouse.gov to prevent tampering.

Yes at some level. You don't need to public key sign the every object, there are other ways too.

Similar to the above, although all Webex users are authorized to publish Youtube clips. The same deal, a user could poison the cache and publish the wrong media, like a rick-roll or something more sinister. The same solution, youtube needs to sign everything. Although I'm not even sure if this is legal, and there's some billing ramifications.

This use-case seems to be misplaced or I am not understanding. Youtube publish needs per user auth token to publish media. Each user in the above use-case will be publishing his/her own track and there is no cache poisoning.

But really, why do you want dynamic origins? Can you please elaborate "dynamic origins" ?

fluffy commented 1 year ago

I think we have lots of confusion about what we mean by all of this. I think people are turning this into something way too complex and not needed.

I view any HTTP URI with a domain name as globally unique by what we mean here for globally unique.

I view that in in today's CDNs, multiple CDN can serve up whitehouse.gov. Basically it might be the for DNS requests in the roughly the US, the CNAME points at cloudflare but for the rest of the world the CNAME that points at Akamai. I think this is fairly common.

Just ignore multiple CDN for a second and only think about distributions side. When A and B both want to watch a video, something needs to indicate what video it is. I think of that a HTTP like path name that defines the application and which video. For example, www.youtube.com/watch?v=3456 and yes perhaps some more stuff after that for different resolution languages or whatever. But I think we agree on the later part, the question is where is the information the part that need to uniquely identify the application and identity which video inside that application. Something needs to tell the CDN if A and B should get the same video or different ones. I suspect some people think that is part of the track name or something and some people think it is a setup or connect message or something. I'm not worried about where the bits are so much but we seem to keep coming back to this. We need some way the relay knows if A and B get the same or different videos.

The next thing is authorization. Clearly, in any CDN, not everyone that can read some data can be allowed to write the data or you will have cache poisoning. There has to be separate token to authorize read and write into the cache.

In any CDN situation, if a service like youtube uses more than one CDN, they need to synchronize how the data gets into both CDNs. I don't think this is any different and I don't think it requires a routing protocol between the CDNs. The origin will take things from each CDN and synchronize the deliver of them onto all the other CDNs that applications uses. That does not rule out certain CDNs having more optimized ways to move data between CDN. For example, bulk subscriptions to other CDNs, but I don't think it changes any of the design for client to relay.

If it is as complicated as what Luke is thinking, then I agree with Luke that this looks too complicated, but I don't think it is that complicated at all, I feel like we just mean different things by words like "globally unique".

( And as a side note, the SIP spec talks about "globally unique across space and time" and I have never had any idea what that means - so lets avoid stuff that makes no sense to anyone )

fluffy commented 1 year ago

I would like to make a proposal on the call today that I hope can sort this out. I'm trying to model this after todays HTTP CDNs.

My proposal is we defined "globally unique" to mean the full track name for a given application, is unique across all the CDNs that that application uses.

The goal here is simply that we when A and B both subscribe, the relay knows to send them the same or different things. One way to do this is track names that are like HTTP URLs. And if webex has a client publishing to webex.com/meeting123/track22, that does not mean the the DNS for webex.com resolves to the IP of the client, it just means that webex delegated that part of the name space for the client to use.

If you had a CDN that only supported one application and that application had a database of integers for track names, that would work too. So for example, on the twitch CDN, a twitch broadcast ID might be globally unique. (I don't understand the twitch CDN so I might have this wrong). If you had just a single relay running on localhost, a track name of "1" could be a globally unique full track name.

There is never any way to enforce this. The CDNs can have business agreements with application using the CDN about what portion of names the CDN is willing to authorize and route.

kixelated commented 1 year ago

I view any HTTP URI with a domain name as globally unique by what we mean here for globally unique.

First off, this draft is the equivalent of saying that part of the HTTP path must be globally unique.

Second off, HTTP resources do not have a globally unique URI. The URI is a routing mechanism, not a unique identifier. The same URI may serve different resources, and the same resource may be served out of different URIs.

CDNs are configured on how to cache, deduplicate, and fetch HTTP resources. This is done based on the full path by default, but the cache can be split based on headers like Accept-Language, Accept-Encoding, Auth, Cookie, Host, Origin, Referrer, etc. The connection properties can be used, like IP address, ASN, and Geo. A fragment of the full path can be used, like the extension, the base directory, or just some arbitrary regex.

From what I can gather, you would like to encode all of these cacheable properties into a globally unique track name, that is exclusively used as the cache key. That does not work for HTTP and I do not think it will work for MoQ, and it cannot be required.

You are certainly allowed to use unique track names, just like you're allowed to use unique HTTP paths. Although I still question how you could make "globally unique" names without an authority

( And as a side note, the SIP spec talks about "globally unique across space and time" and I have never had any idea what that means - so lets avoid stuff that makes no sense to anyone )

It's a good point, because it's not clear if "globally unique" in the draft means at this instant or for all time. Are you allowed to reuse track names in the future?

I want to remove this text. it's a vague, unenforceable, and restrictive MUST.

afrind commented 1 year ago

Individual comment:

First off, this draft is the equivalent of saying that part of the HTTP path must be globally unique.

I don't read this as the intent - Full Track Name is what is intended to be unique. FTN is comprised of Track Namespace and Track Name. I view Full Track Name and URL as similar, but the division between Namespace and Name is not necessarily the same as the division between authority and path. I like the way Cullen phrased it which is that part of the namespace (little n) can be delegated to a client.

The same URI may serve different resources, and the same resource may be served out of different URIs.

I think this needs to be discussed - can two subscribers subscribe to the same FTN and receive different content?

wilaw commented 1 year ago

Although I still question how you could make "globally unique" names without an authority

By leveraging an existing authority for global uniqueness, which is the domain name system. By using streams.whitehouse.gov/live/stream/3 as your track name, you avoid conflicts with all other moq-transport tracks on all distribution networks. Yes, someone else can claim that they also have authority to publish under the namespace streams. whitehouse.gov. In that case we can leverage certificates in which case only one entity has the correct certificate to prove that they have the rights to that namespace. If we don;t leverage existing domain name registration and certification , then we need to build an almost identical system for assuring uniqueness. That seems unnecessary.

BTW - I like the term "network unique" as a replacement for "globally unique".

kixelated commented 1 year ago

I filed #159 to explain how I see things working. The disagreement is over the track_namespace and I think I can simplify into a few questions:

  1. Is track_namespace independent of the connect_url?
  2. Does the producer's track_namespace need to be the same as the consumer's track_namespace?
  3. Do all consumer's track_namespace need to be the same? 3a. What about between different CDNs?

From what I understand, the phrase "globally unique" implies yes to all of the above. The track_namespace is the only piece of information allowed to identify a piece of content.

kixelated commented 1 year ago

Here are some concrete examples of what I would like to support. I'm keeping it Twitch specific since I'm most familiar with that system, but there's a multitude of different options.

OBS could be provided:

connect_url: https://ingest.twitch.tv track_namespace: kixelated auth: <token>

Anycast or geo DNS is used to route to the closest Twitch origin (in this example: cmh01). Each unique broadcast would be stored in a database and given a unique ID to distinguish between later broadcasts (in this example: 1234532).

The catalog contains relative track names, so the track_namespace could be rewritten without rewriting the catalog.

VLC could be provided:

connect_url: https://video-edge-abc123.sfo01.twitch.tv track_namespace: video-ingest-def456.cmh01/kixelated/1234532 auth: <token>

The connect URL is provided on a per-user basis based on capacity. The namespace encodes routing information, so my CDN can perform stateless routing to the origin (no database lookup at every hop).

Note that the auth token is scoped to ONLY the provided connect_url and track_namespace. We explicitly prohibit users from being able to choose their own edge since we perform application-level load-balancing.

or

connect_url: https://moq.akamai.com track_namespace: twitch.tv/cdn-origin.jfk06/kixelated/1234532 auth: <token>

Note that the routing information in the namespace is different for Akamai, as we have pre-negotiated peering agreements in New York (jfk06). How to route to the Twitch origin, including how that information is encoded in the track namespace, could be different for each CDN.

The auth token is also different based on the connect URL. Again, we don't want users to be able to choose their own CDN or edge for cost reasons. But also we need to use a scheme that Akamai supports, which may be different than our own edge or other CDNs.

or

connect_url: https://video-edge-abc123.sfo01.twitch.tv track_namespace: a124hdfhkjae234rwtjnwovmasdokv232asdg

This is actually how Twitch works today. The request path contains an encrypted payload so we can both hide and prevent tampering with any routing information. It also contains some user information so we can track metrics on a per user-basis. No authentication token is required since the namespace is unguessable.

or

connect_url: https://moq.cloudflare.com/twitch.tv/kixelated track_namespace: jfk06/1234532

This is an example of something more controversial... a vanity URL. The CDN is configured with a mechanism to route requests to the origin based on both the connect_url and track_namespace.

afrind commented 1 year ago

Thank you for the simplification.

Individual comment:

Is track_namespace independent of the connect_url?

In the meeting I heard "generally yes. You could couple them, but that might come with limitations".

Does the producer's track_namespace need to be the same as the consumer's track_namespace?

I think this depends a bit on the architecture of the application using moq. In a system like Meta live today, we have effectively two systems, one for contribution and one for distribution, and a lot of Meta-only business logic in between -- it sounds like twitch is similar. In this case, the producer and consumer can have separate namespaces. In a system where someone writes an application (producer and consumer) and wants to traverse a generic relay (CDN), then yes, the namespaces need to be the same. My read is that both flows are ok.

Do all consumer's track_namespace need to be the same?

Within a relay network connected to the publisher via ANNOUNCE, I think the answer is yes. This applies more to the generic case than the closed case (eg: Meta). The protocol only allows for the publisher to announce one namespace for a given track. I'm imagining it's possible for a publisher to connect to two different CDNs, and could announce different track_namespaces to each, but publish the same content. I don't know that's advantageous - as far as the CDNs are concerned these are two totally different track_namespaces, and it might complicate a bunch of other things.

suhasHere commented 1 year ago
  1. Is track_namespace independent of the connect_url?

yes , they are different as defined in https://kixelated.github.io/warp-draft/draft-lcurley-warp.html#name-connection-url +1 to Alan's point on the exception where they can be same but comes with limitations

2. Does the producer's track_namespace need to be the same as the consumer's track_namespace?

This questions needs further qualification on architecture to answer i think and below is take of typical setup

Please do refer to https://datatracker.ietf.org/doc/html/draft-nandakumar-moq-scenarios-00#section-2 for some explanation on this topic.

3. Do all consumer's track_namespace need to be the same?

I don't think I understand this question. can you please elaborate

fluffy commented 1 year ago
  1. Is track_namespace independent of the connect_url?

Yes. (You could twist my arm into the default track_namespace was the connect_url if not track_namespace was provide if that made things easier in some way)

  1. Does the producer's track_namespace need to be the same as the consumer's track_namespace?

Depends on what you mean but I think mostly No. Lets say the producer is using a namespace of fluffy-ingest and that goes to a transcode that republishes the media on a distribution network on akamai with the namespace fluffy786 and also publishes the media on cloudflare with the namespace webex.com:fluffy. Consumer on akamai would be using fluffy786 and consumers on cloudflare would be using webex.com:fluffy. That would all seem fine to me.

If there was just one CDN, and that CDN had some API for applications to say "hey fluffy-canada is an alias for the namespace fluffy-ingest", that would also seem fine to me but the MoQ protocol does not define a way to provide that alias. (I would not object to adding a way to define aliases in an annouce or similar message).

We just need some fixed set of well defined bits that allow the Relays to map any given subscription to the corespoding annouce.

  1. Do all consumer's track_namespace need to be the same? 3a. What about between different CDNs?

(covered above)

fluffy commented 1 year ago

Few comments on example given connect/track/auth message Luke had above. Very useful to have theses concreted examples. Thanks you.

In the example: connect_url: https://ingest.twitch.tv track_namespace: kixelated

I assume that this example is on the twitch CDN and that kixelated provided the uniqueness so that two different users don't publish to the same namespace. Do I have that correct ?

If it is the connect_url + track_namesace that is unique for that user, then I think you should just make your track namspace be https://ingest.twitch.tv/kixelated. If that causes problems in rewriting the catalog, then we should fix that by making the catalog so it does not have the rewrite problem.

When you talk about

video-ingest-def456.cmh01/kixelated/1234532

I think what you are saying is your particular CDN has a prorietary meachism, which clients don't even need to know about, to say that video-ingest-def456.cmh01/kixelated/1234532 is alias for kixelated but that also caries the state tokens to allow used by the CDN.

I'g got no issue with that, application and CDN will embed all kinds of state into the namespace bits. Go for it, but that is all outside the moq protocol. We will want to keep that out of protocol as it is probably covered by many patents.

On the encrypted namespace example of

track_namespace: a124hdfhkjae234rwtjnwovmasdokv232asdg

Yes, that sense for the twitch CDN. IF that was on Akamai I might expect it to look more like

track_namespace: twitch.tv/a124hdfhkjae234rwtjnwovmasdokv232asdg

on the assumption that Akamai was providing relay for other services as well.

On the vanity URL, I don't thing a track namespace is going to show up a user readable thing so seems sort of unlikely place to have a vanity URL, but as long as the CDN had a way to make sure they did allocate the same name to two different users, I don't see a problem.

On the auth of track_namespace: a124hdfhkjae234rwtjnwovmasdokv232asdg

I view this as just an example of a124hdfhkjae234rwtjnwovmasdokv232asdg being the bearer token. You just stuff that same bits in the auth token and have that be the way that CDN does the auth for that namespace.

On the auth tokens. I would argue strongly for separate auth tokens for the connect URL and the namespace. In some situations you do want to authenticate the connect but it may not always be known at the time you want to generate the auth tokens for the namespace. In the case of enterprise relay, it may not even be operated by the same organization. This has some parallels to HTTP Proxy. Nothing would stop an application from using same auth token for both but from a protocol point of view, I think it is better to separate them.

I also think that the auth tokens for a namespace need to allow for separate tokens for consumer vs producer and allow multiple tokens for things such as key rotation, upgrade, multidomain etc.

suhasHere commented 1 year ago

On the auth tokens. I would argue strongly for separate auth tokens for the connect URL and the namespace

+1

Lets say the producer is using a namespace of fluffy-ingest and that goes to a transcode that republishes the media on a distribution network on akamai with the namespace fluffy786 and also publishes the media on cloudflare with the namespace webex.com:fluffy.

This would be the Media Transformer entity within moq architecture, where a transcoder sources its own tracks by republishing the media after carrying out necessary transformation.

afrind commented 1 year ago

@kixelated Did #162 resolve this issue?

kixelated commented 1 year ago

👍