j0sh commented 6 years ago

Broadcaster-Transcoder Networking v2

The current broadcaster-transcoder largely follows a request-reply flow. Use HTTP in order to leverage existing tooling, services, and experience. Allows orchestrators (transcoder managers) to scale without writing too much custom code around our networking protocol.

broadcaster-transcoder network v2 1 Sequence diagram source

Using this proposed protocol, object storage essentially becomes a "drop-in" feature that we can incorporate whenever appropriate. We focus on direct uploads from the broadcaster to the transcoder ('local' object storage). There is an addendum that sketches out how object storage could work within this protocol.

Registry

Needed to publish orchestrator URI. Look up URI based on address assigned by the protocol. Options:

Publish URI on the Blockchain. Does not require Livepeer-run infra or anyone else to store the info.
- Immediate liveness checking/feedback
Libp2p based gossip protocol. Still requires bootnodes.
Global registration HTTP endpoint. Simplest method without smart contract changes.
- Eg, GET livepeer.org/transcoder/<transcoder-eth-address>

Jobs

Request `JobReq`

Submits JobID, StreamID, signature.
Later on we can use this request for capability and price negotiation.

Response

URI to the transcoder that will handle the segments. Could even be a multiaddr.
Allows transcoder farms or pools to direct load wherever is appropriate.
- Orchestrators don't need to explicitly proxy anything themselves.
- Lightens the load on a single ‘blockchain assigned’ node.
Upload credentials with a random token or URI path for the segment
- Preferably time limited, based on a rolling window.
Later on we can add fields for different upload options. S3, etc.
Master playlist manifest
Error response if applicable

Segments

Request `SegmentReq`

Include hash, seq, sig in headers.
- Allows preemptive auth checking before downloading the entire request.
Can contain raw data, eg direct upload
- No intermediate object store. Post and transcode.
Or can contain an URI (eg, S3 URL)
- Allows broadcasters to bring their own object storage without further effort.
- Could also point to orchestrator-owned object storage
- Bypasses latency concerns with notifications, propagation delay. (See "Object Store" section)

Response

Transcoder Response
Error response if applicable
- We have a lot of errors right now that the transcoder simply swallows up. Allows for feedback to broadcaster.

Additional Notes

Semantics

Broadcasters can send a JobReq to an orchestrator at any time. For example, maybe they want to renew their credentials to restart streaming after an offline period, or the existing segment transcoder endpoint seems unresponsive.

The onus is entirely on the broadcaster to confirm the job with the orchestrator, and to send segments to the transcoder. Transcoders are no longer under any obligation to initiate jobs or seek out segments from the broadcaster. This greatly simplifies the amount of guessing the transcoder has to do, compared to the current protocol. See Push vs Pull.

This design intentionally minimizes the amount of state that a transcoder needs to hold for a job. Hence, the broadcaster can stop sending segments anytime without explicitly stopping the stream. Preferably it also minimizes the amount of Livepeer infrastructure that is required (bootnodes etc), and in general minimize the entities that stand in between a broadcaster and a transcoder.

Implementation

Use Protocol Buffers + gRPC for orchestrator messaging. Allows us to spend less time coding the implementation, and more time on specifying behaviors.
Straight HTTP(2?) for posting segments to the transcoder
- Protobufs is not optimized for large payloads, and gRPC seems to require tuning to maintain high throughput. Since gRPC is HTTP2 based anyway, eliminate the overhead.

Security Considerations

Certs might be a problem for TLS. If we don't do TLS from the outset, add an "envelope signature" of the canonical request (headers + body) using the node's Eth key.
TLS wild idea, sign certs using a "CA" derived from the node's Ethereum key. Mutual auth (server/client). More researchy, can do this later.

Relays

We can build a stateless HTTP-based relay protocol using similar ideas as presented here, but the current focus is on the broadcaster-transcoder flow.

Racing

Transcoder races can be supported without any additional modifications to the networking protocol. What happens beyond the segment transcoder is orthogonal to the broadcaster-transcoder flow. Segment transcoders may elect to do the job themselves, assign the work to an internal transcoder pool, allow a swarm of transcoders to race for the segment, etc.

Push vs Pull

Currently, our system is a bit of a hybrid between pull and push: the transcoder sends a TranscodeSub request to the broadcaster each time it begins transcoding (pull), while the broadcaster pushes segments out once they are ready. HLS and DASH are purely pull-based models, where the player polls for new manifests periodically. This proposal is a pure push based model, in which the broadcaster would be responsible for notifying the orchestrator of new segments, without waiting for an explicit request from the transcoder. (Naturally, the orchestrator/transcoder would have to check whether it was actually assigned to the job, but this can be a quick check of the HTTP headers without downloading the body of the request.)

In the context of Livepeer, the question of push vs pull is fundamentally one of state management: who should be responsible for maintaining the state associated with a job, setting it up, and tearing it down? We also want to make it easier for transcoders to make management and scalability decisions, without being hindered by intrinsic characteristics of the networking protocol.

Much of the reasoning for a "push" based system is outlined here livepeer/go-livepeer-basicnet#34 . Although that initial proposal was oriented towards a libp2p-based system, most of the benefits of a broadcaster initiated connection ('push') still hold. The more salient points are reviewed below.

Drawbacks of Pull

Pull has many downsides in the context of Livepeer's broadcaster-transcoder flow, and few benefits. With a pull based model, the onus is on the transcoder to constantly ask: is this job still active? Broadcasters are much more likely to have problems. They are less technical, there is a human in the middle trying to set up the stream and manipulate the capture setup, streams are much less likely to EOF cleanly, and so forth. The transcoder doesn't know the true state of the job; only the broadcaster knows, and we can't rely on the broadcaster to be reliable. Fundamentally, transcoders are persistent, while broadcasters are not.

Long-running jobs impose an expensive externality on the network. For every job that isn't shut down cleanly, the transcoder has to constantly check if new segments are available, for the duration of the job. With thousands of jobs, this adds up. Currently, if a transcoder needs to restart, it will attempt to resume "active" jobs by sending TranscodeSub requests repeatedly. Most of these requests will never get a response. Fortunately, this only takes effect if a transcoder needs to restart in the middle of a non-EOF'd job.

However, the situation gets worse if the transcoder is responsible for fetching from an object store entirely using a pull-based model, eg, acting as a HLS player. Unnecessary requests are made for the lifetime of the job, potentially every 4 seconds. Many services, including S3, charge per GET request. (Note: be careful about putting a CDN in here. Cloudfront's per-request pricing is more than 2x that of S3.) This is objectively worse than our current system, which only polls when the transcoder is trying to initiate a job; thereafter, segments are delivered via push without further transcoder intervention.

Polling works for HLS or DASH because the cost to the requesting client is limited: it only has to request one manifest every so often, and stops as soon as the user is finished watching. Transcoders would need to poll for every job that was active. Polling is also especially undesirable for object stores that are outside the transcoder's control, eg provided by the broadcaster. If we decide to "race" the transcoding among multiple nodes, then the implications of polling go even deeper, and that needs to be considered.

Any pull-based system would require more active management in order to load balance jobs among different transcoding nodes, especially if the load needs to shift mid-job. At the very least, operators need to tear down state in one transcoder and bring up state in another. With push, the per-job state management is minimal, limited only to tracking the segments that have had work done.

Benefits of Push

The proposed split between JobReq and SegmentReq enables feedback and failover, which gives all parties more control of the behavior of the system. If a transcoder appears unresponsive, the broadcaster will get immediate feedback saying so, eg via a failed TCP connection, a HTTP error code, etc. The broadcaster can then send another JobReq to the orchestrator to request a different segment transcoder. The feedback is useful as a diagnostic tool for the broadcasters. Contrast with a pull based system, where one first has to infer that no work is happening when there should be, and trace the problem from there. For broadcasters, this is an especially opaque and frustrating experience, where they have no visibility into the network and cannot help debug.

State management is much easier. In particular, transcoders only need to allocate resources when a segment comes in. They don't need to check for segment availability or take any action when segments stop coming in. They either transcode a segment upon request, or do nothing. Simpler state management has especially large benefits for failover. For example, a segment transcoder may be briefly knocked offline. SegmentReq feedback would make this immediately obvious (if the orchestrator’s internal monitoring doesn't catch it first). The broadcaster can then re-request JobReq from the orchestrator, which can direct the broadcaster to a different node. After the segment transcoder comes back online, there is no uncertainty about whether the transcoder should continue with the original job, because there is no state to tear down.

Load balancing also becomes a matter of distributing the work at the moment of the request. Coupled with HTTP, this falls squarely into the extremely well-trod space of distributing Web requests. That opens up a world of options for transcoders, who can now use mature tools and services to manage their operations.

Delayed jobs are also a better fit for the push model, in order to reduce transcoder uncertainty about when a job will start or resume. With push, there is no uncertainty, because the transcoder doesn't need to be aware of the broadcaster's state, or be constantly seeking it out.

Object Store

This proposal is complementary to any object store system, and in fact resolves a number of issues related to using object stores.

First, a clarification on what the request flow would look like with a transcoder-owned S3. Note that the v2 networking proposal is also accommodating towards broadcaster-owned object stores, object stores that aren't as feature-rich as S3, or plain HTTP endpoints, etc...

network v2_ object store interoperability Sequence diagram source

The key point to note is that the segment transcoder can either receive raw segment data, or an URI pointing to a segment that it can fetch from. This URI is provider-agnostic and completely decouples the networking and transcoding from the mechanics of managing an object store system.

The use of SegmentReq may seem redundant after the upload, but actually serves three important purposes.

Firstly, it keeps the protocol behavior consistent with the rest of this proposal.
Secondly, it allows for feedback after the transcoding operation. Otherwise, uploading to an object store is a fire-and-forget operation; the broadcaster has no idea whether the transcoder picked up the segment, or if there was an error in processing it.
Finally, it solves a major problem with 'passive' object stores and the push model: the transcoder needs to quickly be notified that a new segment is available.

Object Store Notification Options

For the reasons outlined earlier, polling should be avoided. Services such as S3 have several notification mechanisms (SQS, SNS, Lambda) and other cloud vendors likely have their own mechanisms. Transcoders would then be responsible for setting those up themselves. However:

Notifications are an additional moving part for the transcoder to set up. This raises the barrier of entry to being a transcoder, and complicates troubleshooting when things go wrong.
Notification delay and reliability is uncertain. We are operating under rather tight timing constraints. Even one second of delay in delivering the notification takes 25% out of the transcoding time budget. We also don't want queues to fill up and dump a bunch of notifications on a transcoder at once.

Transcoders are still free to set up their own notifications system, and ignore SegmentReq if they can authenticate the upload. But we should not make this mandatory; if we can help bypass the inherent complexity of using an object store system, we should.

Note that if transcoders elect to provide a S3 bucket for broadcast uploads, it will certainly be their responsibility to monitor its usage, ensure that segments are cleaned up, etc. That is not part of the Livepeer networking protocol, although we may elect to implement such management assistance as part of a S3 sub-module.

Sending SegmentReq also enables the broadcaster to bring their own object store without any explicit support from the transcoder. The segment transcoder simply needs a URL to fetch a segment from. Additionally, transcoders can use other object store systems that might not have a notification mechanism baked in.

yondonfu commented 6 years ago

Publish URI on the Blockchain.

The BondingManager contract already implements a form of a registry listing transcoders. Putting aside some necessary naming changes to avoid confusion, the BondingManager contract could maintain a registry listing orchestrators instead without any additional state or logic changes. Since we already have plans to use an ENS Registry + Resolver to map addresses to human readable names and campaign URLs, we could also use that same setup to map addresses to orchestrator URIs. Then, a broadcaster could determine what URI to use for JobReq based on the assigned orchestrator ETH address and a contract lookup for the orchestrator URI.

With the orchestrator filling the role formerly occupied by the transcoder at least in the context of the BondingManager contract, the orchestrator now needs to be the one to submit transcode claims and submit challenged segments for verification. I think this can be accomplished via an orchestrator object store that transcoders push receipt/segment data to (with the assumption that the orchestrator trusts its transcoders and the transcoders trust the orchestrator)

j0sh commented 6 years ago

The BondingManager contract already implements a form of a registry listing transcoders.

Yeah, it was actually here that I was thinking of adding the URI information:

https://github.com/livepeer/protocol/blob/27c2e312da740a8ba852b148766582921ab67819/contracts/bonding/BondingManager.sol#L163-L253

I don't know much about how ENS works but if there's a way we can resolve additional 'record types' (such as a Livepeer URI) then that would indeed be a fairly neat solution. Writing URI entries on the blockchain seems preferable to me. It doesn't add a moving part to the system, and slots nicely into the BondingManager.

With the orchestrator filling the role formerly occupied by the transcoder ... the orchestrator now needs to be the one to submit transcode claims and submit challenged segments for verification.

For the time being, I don't think there will be too many changes to the goclient implementation, as far as claims and verification goes. The network architecture is meant to enable a logical partition between the orchestrator and the transcoder, but for now, those are going to be the same physical entities on the network.

But I agree that, looking at the smart-contract protocol alongside the networking protocol, there is, by default, a large element of trust in the orchestrator. That could be mitigated by allowing the assigned orchestrator to delegate claiming authority to others. Might be a good topic for a LIP.

There are certainly some other things to consider related to the smart-contract protocol. For example, when the orchestrator assigns the broadcaster a bearer token to use for submitting segments to a transcoder, the latest sequence ID probably should be in there somewhere to avoid doing double-work. But from the perspective of the networking protocol and the broadcaster, the token is an opaque blob.

Likewise, the issue of trust is an interesting one. If an orchestrator's transcode farm is entirely internal, then trust is not an issue. If the orchestrator is a public "transcoder pool" then maybe they'll build their own smart-contract layer on top, to ensure an equitable distribution of work or fees.

The design space here is very large, and we want to give transcoders the flexibility to define their own architecture without having to bend too much to accommodate the networking protocol.

dob commented 6 years ago

Thanks @j0sh! I like the idea of starting to build support for the orchestrator/transcoder relationship while still fitting the first pass into our existing on-chain protocol. So for example, orchestrators could essentially be transcoders, and be responsible for verification themselves as far as the chain is concerned, while using the updated networking protocol off chain to get the segments.

Something that we probably don't need to overoptimize for at this stage, but to be aware of is DDoS'ing orchestrators who publish their HTTP endpoints on chain. I guess we can offload this to orchestrators at the moment and have them use existing services to mitigate this if it's an issue.

As for the TLS issues, how about HTTP with encrypted payload params using the orchestrators' public key? Also maybe an overoptimization.

yondonfu commented 6 years ago

For the time being, I don't think there will be too many changes to the goclient implementation, as far as claims and verification goes. The network architecture is meant to enable a logical partition between the orchestrator and the transcoder, but for now, those are going to be the same physical entities on the network.

Yep! Mentioning the smart contract related implications here just for reference when there is a physical separation between the orchestrator and transcoder entities - we can move discussion/design around those topics out of this issue.

jozanza commented 6 years ago

I think shh is actually a really good fit for this advertising of addresses for a broadcaster trying to establish a connection with a transcoder.

j0sh commented 6 years ago

About gRPC/Protobufs, I was reviewing the documentation and saw this unfortunate tidbit:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

https://developers.google.com/protocol-buffers/docs/techniques#large-data

A four-second segment from OBS clocks in at around 1.4MB here. Of course, this may vary based on encoding settings, but mine looks fairly standard -- 2.5mbit, 720p, 30fps, x264 main profile at CRF, plus 150kbps 44100khz stereo AAC-LC. This could easily be larger.

For now, we can stay with grpc+protobufs for the orchestrator, but raw segment payloads to the transcoder can go via plain HTTP(2?) [1]. Can change this behavior if anyone has strong thoughts either way. Updated the doc.

Using a separate protocol for the transcoder might seem inconsistent, but appears more reasonable when considering we could add other ingest endpoints to the transcoder, especially once gas accounting is in and we aren't limited to segment-based ingest. For example, we could add an RTMP endpoint to the transcoder, or SRT, WebRTC, straight mpegts, etc etc.

[1] Looked at Flat Buffers + gRPC as an alternative, but I'm generally getting leery of the amount of gRPC tuning needed to get optimal performance for large/streaming payloads [2][3]. The Flatbuffers API is also clunkier since it's oriented around zero-copy deserialization -- which is not really a tradeoff I want to make in the general case.

[2] https://github.com/grpc/grpc-go/issues/1043 [3] https://github.com/grpc/grpc.github.io/issues/371

dob commented 6 years ago

I think it's very reasonable to use different protocols for the p2p orchestration messages and the transcoder segment data. Especially since the segment data will most likely be pulled from an object store anyway.

j0sh commented 6 years ago

Merged in https://github.com/livepeer/go-livepeer/commit/2e9b5dacb961d2027d189c7603bdf07220dce121! Thanks everybody for all your hard work on this!

livepeer / go-livepeer

Broadcaster-Transcoder Networking v2 #430

Broadcaster-Transcoder Networking v2

Registry