Discovery of publishers via a relay instead of catalog

afrind commented 2 months ago

This is a redux of #228, now that I've heard from multiple other wg participants they are now interested in such a mechanism.

Consider using moq for a chat or other application where a relay has many publishers for the same application. In order for subscribes to be routed to the correct publishing endpoints, each publisher/participant needs to announce a unique namespace (this is how moq-chat operates, see https://afrind.github.io/draft-frindell-moq-chat/draft-frindell-moq-chat.html#section-4.4).

This creates an issue: how do participants learn about the namespaces/tracks other clients have ANNOUNCED to the relay within the same chat room? I originally filed #228 thinking there should be a mechanism by which a client can declare their interest in new announcements and become a catalog-less means of track discovery. Based on wg feedback at the time, I updated moq-chat to use a different discovery mechanism involving a simple catalog.

Here's an alternate proposal:

SUBSCRIBE_NAMESPACE {
   Track Namespace Prefix (b),
   Auth Info (b)
 }

This would inform the relay to forward any ANNOUNCE with a prefix match, if authorized.

So in moq-chat, each new participant would do this:

ANNOUNCE(namespace="moq-chat/id/participant/<name>")
SUBSCRIBE_NAMESPACE(prefix="moq-chat/id/participant/")

And a relay would reply with:

ANNOUNCE_OK
SUBSCRIBE_NAMESPACE_OK
# existing announcements matching namespace prefix
ANNOUNCE(namespace="moq-chat/id/participant/<member1>") 
ANNOUNCE(namespace="moq-chat/id/participant/<member2>")
ANNOUNCE(namespace="moq-chat/id/participant/<member3>")
...

It would also forward any new ANNOUNCEs or ANNOUNCE_CANCELs matching the prefix to the participant.

The new participant then issues standard SUBSCRIBE to each existing chat member

SUBSCRIBE(namespace="moq-chat/id/participant/<member1>", name="chat") 
SUBSCRIBE(namespace="moq-chat/id/participant/<member2>", name="chat") 
...

This would remove the need for a catalog in some multi-publisher applications (eg: key distribution https://suhashere.github.io/moq-e2ee-mls/draft-jennings-moq-e2ee-mls.html), but also keep the scope of ANNOUNCE the same as it is today.

kixelated commented 2 months ago

We definitely need a discovery mechanism like this.

I built moq-dir which is basically the same thing as Alan has proposed. The difference is that it produces tracks based on received announcements where each announce/unannounce is an object.

I'm a little torn on if this should be part of MoQT or layered on top of tracks. I do think it's super useful especially for conferencing to discover all participants and should be standardized in some capacity.

wilaw commented 2 months ago

Two clarification questions for you

For this subscription

SUBSCRIBE(namespace="moq-chat/id/participant/<member1>", name="chat") 
SUBSCRIBE(namespace="moq-chat/id/participant/<member2>", name="chat")

How does the end-subscriber know to use the track name "chat"?

How this would work for a new user at a cold edge, i.e. one that had not seen that namespace before?

End-subscriber issues

ANNOUNCE(namespace="moq-chat/id/participant/<name>")
SUBSCRIBE_NAMESPACE(prefix="moq-chat/id/participant/")

Relay sends a ANNOUNCE_OK and a SUBSCRIBE_NAMESPACE_OK. Then, because it has never seen that namespace before, it issues its own SUBSCRIBE_NAMESPACE(prefix="moq-chat/id/participant/") upstream ?

Some time later, it will receive a series of announces which it caches and then forwards to the end-user

ANNOUNCE(namespace="moq-chat/id/participant/<member1>") 
ANNOUNCE(namespace="moq-chat/id/participant/<member2>")
ANNOUNCE(namespace="moq-chat/id/participant/<member3>")

For scalability, is it true that the ANNOUNCES are only forwarded to subscribers who have previously explicitly asked for them?

afrind commented 2 months ago

How does the end-subscriber know to use the track name "chat"?

In my case, that's baked into the client and server. I think it's similar in the MLS key distribution mechanism. I agree that this can't convey more specific track metadata, but you could replace "chat" with "catalog" and use the same mechanism to get a per-publisher catalog.

Then, because it has never seen that namespace before, it issues its own SUBSCRIBE_NAMESPACE(prefix="moq-chat/id/participant/") upstream ?

No, it would just forward any new ANNOUNCEs that come (from any upstream) that match. I'm hand waving a bit past authentication here.

For scalability, is it true that the ANNOUNCES are only forwarded to subscribers who have previously explicitly asked for them?

Yes.

afrind commented 2 months ago

Individual Comment:

After talking with Tim, I realized a problem with my proposal:

SUBSCRIBE_NAMESPACE {
   Track Namespace Prefix (b),
    ...
 }

If I SUBSCRIBE_NAMESPACE /abc, I will also get ANNOUNCEs for /abcdefg. This is the Track Namespace/Track Name problem all over again, with the same solution: I propose we also split a Publisher ID tuple out from Track Namespace.

So would be:

Full Track Name = Track Namespace | Publisher ID | Track Name

Any of the three components can be empty - giving applications flexibility in track naming.

SUBSCRIBE_NAMESPACE would specify an exact match on Track Namespace, rather than prefix bits.

SUBSCRIBE_NAMESPACE {
   Track Namespace (b),
    ...
 }

Splitting namespace into two fields could also help resolve some of our "duplicate announce" issues -- eg it is possible to distinguish a reconnect case (same publisher ID) from a redundant broadcaster (different publisher IDs).

fluffy commented 1 month ago

Your proposal the top makes sense to me, and the issue of if you subscribe to abc you get abcdef seems like exactly what we want, not a problem. It seems like adding publisher id breaks the properties we need.

afrind commented 1 month ago

Individual Comment:

Using prefixes as a selection criteria can work, but places restrictions on how applications construct namespaces in order to prevent receiving information from unrelated publishers and applications.

Consider a generic relay that is designed to handle moq traffic from a multitude of uncoordinated applications. Each application can select whatever namespace they like, so there's no guarantee in the system that they won't pick overlapping namespaces. The only mechanism that can help today is the Auth Info field - whomever grants authorization must prevent issuing tokens for overlapping namespaces to uncoordinated applications. This doesn't feel scalable to me.

Conversely, using tuples and exact matches (a solution we have already agreed on for Track namespace + name) is unambiguous and doesn't require coordination or prescriptive namespace design.

It seems like adding publisher id breaks the properties we need.

I don't see how this breaks. Can you explain the properties you need that are broken?

Using moq-chat as an example, a chat room has a namespace (eg "moq-chat/"), each participant has a publisher ID (eg "afrind" or "fluffy") and each track has a name (eg "chat", "audio", or "catalog").

ANNOUNCE would contain a (namespace, publisher ID) tuple.

SUBSCRIBE would contain a full track name (namespace, publisher ID, name).

SUBSCRIBE_NAMESPACE would inform an endpoint about all the publishers in the relay that are using the namespace, which gives subscribers all the information they need to issue subscribes for the tracks they are interested in.

kixelated commented 1 month ago

I called it moq-dir because it works like a file system with a / delimiter.

-> SUBSCRIBE namespace="/" track="meeting/abc123"

The moq-dir namespace is "/" because it gets ANNOUNCED like any other publisher. It's kind of like a drive and in theory it could be sharded per-customer or something. It also means that any namespaces that do not start with / will not be publicly indexed by moq-dir (aka private namespaces).

The track indicates the desired directory, returning a list of any namespaces with the prefix <namespace>/<track>/ but only one level deep.

The objects return similar results to ls but with a + or - prefix indicating change:

<- OBJECT payload=+alice
<- OBJECT payload=+bob

-> SUBSCRIBE namespace="/meeting/abc123/alice" track=".catalog"
-> SUBSCRIBE namespace="/meeting/abc123/bob" track=".catalog"

Like @afrind said, you could extend this to support a glob instead via * but there's performance ramifications.

In my implementation, I basically bucket each ANNOUNCE by the dirname of the namespace. It's deterministic and can be pre-computed. Performing a glob, on the other hand, requires a more complicated data structure that basically has to be done on demand.

And to clarify, this is for NAMESPACES ONLY. QuicR used * but that was for tracks; the same use-case might not apply.

suhasHere commented 1 month ago

I am not comfortable with moq layer enforcing name restrictions as it is very specific to application domain. Where to split and how many splits to have is decided by the application. I feel the original proposal with matching prefix bit works pretty well. The problems listed above are not really issues but may be expectation from the clients.

afrind commented 1 month ago

Individual Comment:

These arguments feel the same as the namespace and name tuple split, and we reached consensus on using tuples and exact matching. Why were you comfortable with (namespace, name) being a tuple but not (namespace, id, name)?

suhasHere commented 1 month ago

Individual Comment:

These arguments feel the same as the namespace and name tuple split, and we reached consensus on using tuples and exact matching. Why were you comfortable with (namespace, name) being a tuple but not (namespace, id, name)?

because it is not solving any new problem. Nothing here stops from different clients using same publisher id and a generic relay has to make exactly same decision as your original problem statement .. Also as an application i might want 4 way split or different split point and I don't think we want it to be enforced at the moq layer ..

kixelated commented 1 month ago

@suhasHere namespace could be an array of strings instead of using an explicit separator like /

kixelated commented 1 month ago

But I do think we should eventually apply some restrictions/scheme to namespaces in the same vein as URLs. Binary blobs are nice until you actually try to inspect/share them...

vasilvv commented 1 month ago

I have a bit of a concern as to whether this mechanism scales with the number of subscribers. I guess the relay network has to store all announcements somehow anyways, so this mechanism would essentially provide a way to access that state.

I'm not a fan of prefix matches, nor am I a fan of making full track name even bigger, given that this doesn't show up in 90% of scenarios. I'd propose something like:

Add a special announce_tag field in ANNOUNCE.
SUBSCRIBE_TAG lets you specify the exact announce_tag you're listening for.

suhasHere commented 1 month ago

2. SUBSCRIBE_TAG lets you specify the exact announce_tag you're listening for.

Couple of questions

How does the subscribers learn the announce_tag
is announce_tag same as the publisher_id (other than being in a different part of the protocol)

suhasHere commented 1 month ago

But I do think we should eventually apply some restrictions/scheme to namespaces in the same vein as URLs. Binary blobs are nice until you actually try to inspect/share them...

Couple of observations:

Relay forwarding layer (dataplane) need not/should not have to be aware of the structure
Relay control plane can have the function mapping between binary representation to the application representation (strings for example), if needed for inspection or even sharing with humans.

ianswett commented 1 month ago

The slides seem quite reasonable to me, can you write up a PR?

fluffy commented 1 month ago

+1 on PR

afrind commented 1 month ago

...nor am I a fan of making full track name even bigger, given that this doesn't show up in 90% of scenarios. I'd propose something like:

Add a special announce_tag field in ANNOUNCE.

SUBSCRIBE_TAG lets you specify the exact announce_tag you're listening for.

I view announce tags as providing the similar functionality as making namespace an n-tuple. The difference is whether the tags are part of the identifier, or just metadata for the announcement. If it's metadata, it may get redundant with the name, but it doesn't matter that much given the frequency of announcements and subscribe namespace.

I may try to write the PR for both and let the wg decide.

moq-wg / moq-transport

Discovery of publishers via a relay instead of catalog #484