livekit / protocol

LiveKit protocol. Protobuf definitions for LiveKit's signaling protocol
https://docs.livekit.io
Apache License 2.0
68 stars 61 forks source link

Add KV metadata for rooms and participants #728

Closed dennwc closed 1 month ago

dennwc commented 1 month ago

As discussed a few times earlier, we need a way to set (immutable?) metadata on rooms and participants that we control.

For SIP specifically, this means any call-related metadata (calling/called number, call id, etc). It definitely doesn't make sense to put this info directly on Participant.

I considered adding a SIPMetadata message as a field of Participant, but it sounds like an overkill. Nothing in our stack will directly read these values. This is why I think a map would be nicer. It allows any LK sub-system (SIP, Agents, etc) to quickly add metadata fields which will be propagated through the whole stack automatically. It's also possible to put experimental metadata earlier and move it to protocol later.

The current plan is to have a map<string,string> kv_metadata field and have a naming convention for the keys (e.g. livekit.sip.callID).

changeset-bot[bot] commented 1 month ago

🦋 Changeset detected

Latest commit: 627bc199a71be88e62e166e4284fb19bdb324fc3

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR ``` Some errors occurred when validating the changesets config: The package or glob expression "github.com/livekit/protocol" specified in the `fixed` option does not match any package in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch. ```
paulwe commented 1 month ago

using a strictly typed metadata field would be better.

the ability to rapidly evolve our apis is far less valuable than providing a stable, consistent, intuitive api. apis we expose to customers need to be more stable and ergonomic than apis we consume internally. while we can quickly coordinate internal changes the communication overhead for external changes scales poorly. thrashing and breaking customer products is damaging to our brand.

dennwc commented 1 month ago

After today's call we decided to keep it simple and go with a map<string,string>, also allowing users to put their metadata here too (under a prefix).

dennwc commented 1 month ago

@paulwe although I completely agree with your argument about the strict schema, I believe the use case here is a bit different.

We do not want any critical LiveKit-related data (or even metadata) here. We only want to pass some optional metadata for the end user to read.

A good example would be a provider-specific call ID for SIP. We definitely don't want a twilio_call_id in our schema. And some providers have more than one call-identifying piece of metadata (e.g. account ID, trunk ID, etc), so having a provider_call_id might not be enough.

The second reason is that passing separate metadata fields across the system is brittle. KV metadata acts as a single container and can be propagated as a whole, even if the service in the middle cannot directly unmarshal it.

paulwe commented 1 month ago

We do not want any critical LiveKit-related data (or even metadata) here. We only want to pass some optional metadata for the end user to read.

that's not how apis work... https://www.hyrumslaw.com

We definitely don't want a twilio_call_id in our schema

this seems fine if it's a 3rd party integration? sip metadata > provider oneof > twilio > etc...? are the sip providers supplying this data as string typed k/v pairs? if we're passing through a standard semistructured set of headers in this one context that's fine but it sounds like these are vendor specific structured fields. if we're writing code to convert structured vendor api responses to k/v we may as well use structured data.

The second reason is that passing separate metadata fields across the system is brittle. KV metadata acts as a single container and can be propagated as a whole, even if the service in the middle cannot directly unmarshal it.

no, it's not. by default the protobuf unmarshal/marshal propagates unknown fields. eg.

message TestFoo {
  string a = 1;
  string b = 2;
}

message TestBar {
  string a = 1;
}
foo := &api.TestFoo{
    A: "a",
    B: "b",
}

fooBuf := must.Get(proto.Marshal(foo))

bar := &api.TestBar{}
must.Do(proto.Unmarshal(fooBuf, bar))

barBuf := must.Get(proto.Marshal(bar))

foo2 := &api.TestFoo{}
must.Do(proto.Unmarshal(barBuf, foo2))

fmt.Println(protojson.Format(foo2))
{
  "a":  "a",
  "b":  "b"
}

also allowing users to put their metadata here too (under a prefix).

making a single catch all metadata container means making a k/v store part of our api. we do not want to do this. we especially do not want to mix user values and internally generated values in one place because this will create a mess for synchronization. a single unstructured blob of metadata with last write wins semantics and limited space should be enough for storing a pointer to data in customer's dbs...

biglittlebigben commented 1 month ago

We do not want any critical LiveKit-related data (or even metadata) here. We only want to pass some optional metadata for the end user to read.

that's not how apis work... https://www.hyrumslaw.com

We definitely don't want a twilio_call_id in our schema

this seems fine if it's a 3rd party integration? sip metadata > provider oneof > twilio > etc...? are the sip providers supplying this data as string typed k/v pairs? if we're passing through a standard semistructured set of headers in this one context that's fine but it sounds like these are vendor specific structured fields. if we're writing code to convert structured vendor api responses to k/v we may as well use structured data.

The second reason is that passing separate metadata fields across the system is brittle. KV metadata acts as a single container and can be propagated as a whole, even if the service in the middle cannot directly unmarshal it.

no, it's not. by default the protobuf unmarshal/marshal propagates unknown fields. eg.

message TestFoo {
  string a = 1;
  string b = 2;
}

message TestBar {
  string a = 1;
}
foo := &api.TestFoo{
  A: "a",
  B: "b",
}

fooBuf := must.Get(proto.Marshal(foo))

bar := &api.TestBar{}
must.Do(proto.Unmarshal(fooBuf, bar))

barBuf := must.Get(proto.Marshal(bar))

foo2 := &api.TestFoo{}
must.Do(proto.Unmarshal(barBuf, foo2))

fmt.Println(protojson.Format(foo2))
{
  "a":  "a",
  "b":  "b"
}

also allowing users to put their metadata here too (under a prefix).

making a single catch all metadata container means making a k/v store part of our api. we do not want to do this. we especially do not want to mix user values and internally generated values in one place because this will create a mess for synchronization. a single unstructured blob of metadata with last write wins semantics and limited space should be enough for storing a pointer to data in customer's dbs...

Using semi anonymous fields here seem fine to me as, as Dennys mentions, these are informational, service specific fields. I'm afraid that if we specify typed fields, we'll end up reproducing large swatch of other provider APIs in our protocol.

wrt having out own k/v store: I think the design stems from the fact that we believe there is a need for us to provide this functionality (in a limited way) to our customers, as is it likely they will need synchronization semantics across several metadata producers who change the value fairly often to communicate a change in a participant behavior. If you believe implementing this reliably is too much engineering or maintenance work and we'd rather have each of our customers figure out how to do this with the semantics they need, I think that's something we can discuss indeed, but there is real value in providing this as part of our API IMHO.

paulwe commented 1 month ago

if we specify typed fields, we'll end up reproducing large swatch of other provider APIs in our protocol.

if we're passing this data as k/v pairs we'll be doing this anyway only in a half-baked, unmaintainable way. if you believe implementing strict types is too much engineering or maintenance work and we'd rather have each of our customers figure out what values are available i think that's something we can discuss...

anyway, this seems to be happening... the client should implement a map type api with Set, Get, Delete, and Observe methods but the wire format should be a commit log.

message MetadataLog {
  string key = 1;
  TimedVersion version = 2;
  oneof value {
    bool deleted = 3;
    string string = 4;
  }
}

last write wins. updates are unordered and eventually consistent. total k/v size (keys+values) should be limited.

message RoomMetadataLog {
  repeated MetadataLog log = 1;
}

message ParticipantMetadataLog {
  string participant_id = 1;
  repeated MetadataLog log = 2;
}

room and participant metadata should not be part of either room or participant. resending the entire room or participant every time a key changes is not a good use of signal capacity. RoomMetadataLog and ParticipantMetadataLog should be added to SignalResponse. the initial broadcast will contain all k/v pairs and subsequent messages will contain only deltas.

message KVSyncRequest {
  message ParticipantMetadataLog {
    string participant_identity = 1;
    repeated MetadataLog log = 2;
  }

  repeated MetadataLog room = 1;
  repeated ParticipantMetadataLog participants = 2;
}

message KVSyncResponse {}

service KV {
  rpc Sync(KVSyncRequest) returns (KVSyncResponse);
}

k/v api requests received without versions will be assigned one by the service. server apis can implement batch operations eg SetMany, DeleteMany. a non error response indicates that the value was received. updates are best effort - updates to disconnected participants may be discarded.

dennwc commented 1 month ago

@paulwe

that's not how apis work... https://www.hyrumslaw.com

True. But that law also implies that any change, including deprecation will break someone. I think you would agree that one cannot move forward without (occasional) breaking changes.

this seems fine if it's a 3rd party integration?

That's the problem - it's not an integration with Twilio. We do expose SIP protocol, but there are thousands of providers that speak it. We do not want to model all of them in our API. All we need is to map a few to this KV metadata and let users add the rest (if they care).

no, it's not. by default the protobuf unmarshal/marshal propagates unknown fields

Indeed, it will preserve unknown fields in that Message if one carries it around. But it won't work with protojson, for example. Or if you want to partially fill a new message (without Clone).

Maps are simpler in this regard, since they can be serialized to JSON (or any other encoding), rebuilt manually, stored to DB natively, etc.

It would be a problem for livekit-cli - it won't be able to decode and print unknown metadata in case we structure it completely. With maps it will always work, no matter the CLI version, which I think is important for DX.

making a single catch all metadata container means making a k/v store part of our api. we do not want to do this.

To be clear, we are not proposing a full-featured KV API.

It's merely a replacement for existing opaque metadata. Multiple customers want us to somehow pre-fill it with our own data from the service (e.g. SIP). But we cannot do that with existing metadata, since we do not know the encoding.

a single unstructured blob of metadata with last write wins semantics and limited space should be enough for storing a pointer to data in customer's dbs

I think you specifically referenced it here, right?

we especially do not want to mix user values and internally generated values in one place because this will create a mess for synchronization

I share your concerns here. Initial proposal was to make it immutable, so that the first time the participant is created, the metadata will be locked. Or that we make a separate permission in the token to only allow our internal services to modify it.

But during today's sync an argument was made that maybe we could allow users to store their own data there.

Personally I don't have a strong opinion here, especially since I'm not familiar with the way ParticipantInfo synchronization works currently. It would be great if you help me understand what expectations would be realistic here.

For the short term, we only need a way to attach metadata during room/participant creation. That will cover current customer's needs.

If users will end up overwriting keys with our prefix - I think that's their problem. Luckily, current permissions on metadata only allow modifying your own metadata, so potential damage is very limited.

On our side, we need to enforce (via code review) that we do not read this metadata, or at least don't act on it. Rule seems pretty clear: if a field plays a role in server logic, it must be structured properly.

As for the updates, would it be realistic to only have a single guarantee: that a new key can be inserted and it will be visible to all peers eventually? That's the main use case: we want to insert metadata once, and we assume user may also want to insert something (e.g. some identifier).

So maybe we could only allow insertions and no updates? That would very efficiently prevent anyone (including us) from storing any important state there. WDYT?

dennwc commented 1 month ago

@keepingitneil Would be great to have your input here too. Do you plan to have any kind of metadata for Agents that is attached to rooms/participants? And if so, what operations do you expect on this data?

keepingitneil commented 1 month ago

@keepingitneil Would be great to have your input here too. Do you plan to have any kind of metadata for Agents that is attached to rooms/participants? And if so, what operations do you expect on this data?

At the moment we don't have any livekit owned metadata planned for agents themselves. User-defined metadata could be interesting for agent devs, but not essential and also applies to all clients (not just agent clients).

I'm not up to speed with the latest agent durability/orchestration design but I trust @biglittlebigben and @paulwe have already advocated appropriately if that design makes use of metadata.

If I were to opine here 😉 - I like the commit log design. An additional, livekit_owned:true/false field could be added to the log message to prevent users (and livekit) from accidentally using metadata they are not supposed to use.

edit: or instead of livekit_owned: boolean, a namespace: string

paulwe commented 1 month ago

It's merely a replacement for existing opaque metadata. Multiple customers want us to somehow pre-fill it with our own data from the service (e.g. SIP). But we cannot do that with existing metadata, since we do not know the encoding.

the existing metadata is a mutable blob store. participant metadata can be set in the token and updated using the signal and server apis. room metadata can be created and updated using the server api. this is table stakes for any new metadata implementation...

As for the updates, would it be realistic to only have a single guarantee: that a new key can be inserted and it will be visible to all peers eventually? That's the main use case: we want to insert metadata once, and we assume user may also want to insert something (e.g. some identifier).

to guarantee first write wins we have to globally serialize writes. implementing last write wins only requires storing the timestamp of the last update and propagating it to all replicas... either way, once we add a k/v store customers will wonder why it doesn't support durability, acid, transactions, atomic ops, large blob storage, whatever. we should focus on our core offerings rather than over engineering a solution for delivering headers

For the short term, we only need a way to attach metadata during room/participant creation. That will cover current customer's needs.

if we want to add immutable values supplied during room/token creation we should call these fields something completely different to avoid confusion. repeating prefixes is wasteful and inelegant and there is no reason sip, agent, and user supplied values should share a single keyspace.

dennwc commented 1 month ago

the existing metadata is a mutable blob store. participant metadata can be set in the token and updated using the signal and server apis. room metadata can be created and updated using the server api. this is table stakes for any new metadata implementation...

This is actually a pretty good argument to not allow user metadata in KV :thinking:

either way, once we add a k/v store customers will wonder why it doesn't support durability, acid, transactions, atomic ops, large blob storage, whatever. we should focus on our core offerings rather than over engineering a solution for delivering headers

That's exactly what I want to avoid. Let's find the minimal possible contract that doesn't require a transaction log or ACID guarantees.

to guarantee first write wins we have to globally serialize writes

That's the theory, yes. But we do sync ParticipantInfo without a TX log currently. So I'd like to explore alternative contract that could allow us to merge KVs in the most common case at least (where keys never intersect).

if we want to add immutable values supplied during room/token creation we should call these fields something completely different to avoid confusion

Any alternative name you would suggest? I agree that seeing kv_metadata will make users assume that it's similar to metadata and is mutable too. static_metadata? annotations would be similar to K8S.

repeating prefixes is wasteful and inelegant and there is no reason sip, agent, and user supplied values should share a single keyspace

Definitely wasteful for storage/transmission, but it's a well established practice for end-user APIs (e.g. K8S annotations, Redis, etcd). We could change the storage details, but I think it's the most convenient way to access it from user's perspective.

Please take into account that customers are wondering why we cannot deliver this sooner, since it's "only" forwarding metadata. It might not be reasonable to return to them saying "we want to develop KV with TX log for this, so ETA is a >month". We should try hard to find a practical compromise here, in my opinion.

I think the main assumption here is that our services (e.g. SIP) or their services already have a primary DB of some kind. So we don't need strong guarantees for KV. We need a way to tag our models with some identifiers, that presumably do not change over time.

I'm really tempted to say that we should just merge these KV updates in the order they arrive. If someone requests stronger guarantees we should point them in other direction: generate a UID, set it in KV once (no updates!) and store their state in other DB. Later, if we do decide to build a proper KV from it, we could change the wire format to the TX log you proposed. We can deliver this fast enough and it still gives room for future improvement, if needed.

paulwe commented 1 month ago

to guarantee first write wins we have to globally serialize writes

That's the theory, yes. But we do sync ParticipantInfo without a TX log currently. So I'd like to explore alternative contract that could allow us to merge KVs in the most common case at least (where keys never intersect).

we already globally serialize writes to participant state. the media node hosting the participant serializes writes and propagates those changes to replicas. if we can limit this new feature to participant metadata only the solution is much simpler than if we need to build a consistent solution for rooms and participants.

Any alternative name you would suggest? I agree that seeing kv_metadata will make users assume that it's similar to metadata and is mutable too. static_metadata? annotations would be similar to K8S.

annotations, labels, tags, flags, or description? and i would prefix them like sip_tags or agent_labels. unlike k8s or redis implementations these are 1st party features in our product. we don't need to plan for an unbounded number of components like this.

dennwc commented 1 month ago

if we can limit this new feature to participant metadata only the solution is much simpler than if we need to build a consistent solution for rooms and participants

This can actually be the answer. We don't need room metadata for SIP use case.

dennwc commented 1 month ago

OK then, the consensus appears to be that KV can be SIP specific. I will close this PR in favor of a new one. Thanks everyone for the feedback!