Basicnet Protocol Documentation

ericxtang commented 6 years ago

Super Rough Version

Message Types:

Sub
Data (video segment)
Finish
Cancel
TranscodeResponse
GetMasterPlaylist
MasterPlaylistData
NodeStatusReq
NodeStatusData

Node Roles:

Broadcaster

Sends video segments to its listeners
If no listeners, drop the segments
Broadcasts exist for each StrmID

Subscriber

Receives video segments, and puts it into local video streams
Subscribers exist for each StrmID

Relayer

Receives video segments and calls the callback function.
Relayers exist for each (MessageType, StrmID)

Protocol:

When a node wants to broadcast a stream:

Create a local broadcaster, push Data to the broadcaster's listeners
This can happen even when no one in the network wants the stream

When a node wants to subscribe to a stream:

Create a local subscriber with the streamID
- This registers a callback, we usually put the data into a video stream in the caller
Send Sub req to the network

When a node receives a Sub req:

If there is a local broadcaster, add the requester's ID to its listener array
If there is a local relayer, add the requester's ID to its listener array
- We assume the local relayer has already been created and the Sub req has been passed on.
If there is a local subscriber and no local relayer, create a local relayer, add the requester's ID to its listener array
If there is no local relayer, local subscriber, or broadcaster, create a local relayer, add the requester's ID to its listener array, and forward the request along.

When a node receives a Data req:

If there is a local subscriber, send the data there
If there is a local relayer, send the data there
Otherwise, report an error

When a broadcaster finishes a stream:

Send a Finish message to its listeners

When a node receives a Finish message:

If there is a local subscriber, call it with EOF and delete it
If there is a local relayer, forward the Cancel message to its listeners, and delete the relayer

When a node finishes subscribing (for example, player shuts down)

Send a Cancel message to its upstream peer and delete the subscriber

When a node receives a Cancel message:

If there is a local broadcaster, remove the peer from its listeners array.
If there is a local relayer, remove the peer from its listeners array. If this makes the listeners array empty, remove the relayer.

The node sends a TranscodeResponse msg to the broadcasting node when it gets a transcode job from the blockchain.

The broadcasting node should have set up a callback function to be able to receive the TranscodeResponse message (ReceivedTranscodeResponse)

StrmID might be confusing with MasterPlaylistID → Each master playlist contains multiple media playlists (represented by a single StrmID) - https://developer.apple.com/library/content/referencelibrary/GettingStarted/AboutHTTPLiveStreaming/about/about.html

The node populates the masterPlaylist map when it creates a new stream, and updates the masterPlaylist if it receives MasterPlaylistData.

The node sends out a GetMasterPlaylist request when it gets a media server video request

When a node receives a GetMasterPlaylist request:

If we find the masterPlaylist in the local map, return it.
If not
- If the nodeID from the streamID is the current node, we return a NotFound
- Otherwise, we forward it along, and create a local relayer if it doesn't already exist

When a node receives a MasterPlaylistData

If we are the node requesting for it, give it to the requester
If we are not the node requesting for it, we should have a local relayer, so relay the message along

NodeStatus works the same way as GetMasterPlaylist/MasterPlaylistData, it's used for getting a remote node's status (planning to use it for debugging)

ericxtang commented 6 years ago

Change Proposal

Goal We want to prioritize the connection between a broadcaster and a transcoder.

Problem Currently the routing mechanism works like this (Given the network topology of A → B, B → C, B → D)

A wants to subscribe to a stream on D
A sets up a local subscriber, and sends a SUB(StrmID-D) request to B
B is not D, so it tries to find the closest neighbor to D (which is D), creates a relayer locally for (StrmID-D, SubReqID), adds A as a listener, and forwards the SUB request to D
D gets the SUB request and finds the local stream. As every data segment becomes available, it sends DATA back to B
B checks for local relayers, finds the relayer for (StrmID-D, SubReqID), and sends DATA to the relayer listeners (in this case, it's A)
A gets the data, finds the local subscriber, and sends the data to it.

This routing scheme depends on relay nodes having stable connections, and relay nodes often times don't.

Proposed solution *We should prioritize the connection between a broadcaster and a transcoder**

The proposal suggests adding a new message type TranscoderSub(T_MultiAddr, StrmID, T_Sig). Consider a network topology of (B → R, R → T)

B broadcasts a video, creates a job on-chain with StrmID.
T gets assigned the job
T relays TranscoderSub to its neighbor closest to B, and waits for a direct connection from B
Since T doesn't have a direct connection with B yet, TranscoderSub gets sent to R
R is not B, so it looks for the closest neighbor (in this case, it is B), and sends TranscoderSub along
B gets TranscoderSub, verifies T_Sig, and tries to create a direct connection with T using T_MultiAddr.
T gets the direct connection request from B, checks the NodeID of B is the NodeID in StrmID, and uses the same SUB/DATA mechanism to subscribe to the video stream.

In the future, we can add a step before 3 where B sends a req to check for T's availability, but this needs protocol changes to work well

The proposed solution has the following benefits:

Ensure a direct connection between the broadcaster and the transcoder
T can check incoming connection requests and prioritize connections from broadcasters
If the transcoder's IP is accessible, the connection will be successful - even if a broadcaster is sitting behind a NAT.

yondonfu commented 6 years ago

In the proposed solution, what if T is far away from B? If R is reasonably distanced from B and T, could message propagation ever be faster by relaying from B to R and then R to T?

ericxtang commented 6 years ago

@yondonfu I think "distance" is hard to tell until we have some way of measuring it (maybe the better metric is "latency"). I can see a case where R has a super high speed connection to both T and B, in which case relaying would be a better strategy. Maybe that's an optimization we can add after having the proper measuring/testing tools.

jozanza commented 6 years ago

Great idea! It's definitely good to create a direct connection between a transcoder and a broadcaster if possible. Some sort of DHT is a better fit for stream viewing peers rather than a transcoding peer, especially considering the potential latency involved.

One downside though about the (B → R, R → T) topology:

R is not B, so it looks for the closes neighbor (in this case, it is B), and sends TranscoderSub along

B gets TranscoderSub, verifies Tsig, and tries to create a direct connection with T using Taddr and TnodeID.

It seems flooding the TranscoderSub message end up with more latency / noise than is required.

imho, it may be worthwhile to have transcoder node peer ids tracked by all broadcaster nodes and pursue a direct (B -> T) topology. This would have two main implications:

Every broadcaster node would have to get a list of transcoder peer ids at some point after they connect -- either through immediate peers, some sort of signal server, or a well-known decentralized persistence layer.
It becomes a broadcaster's job to listen for assignment, connect, and stream their video. But I think that's fairly lightweight since they could start listening only after creating a job and stop once their broadcast ends as well.

But with those changes, we'd be giving broadcasters the fastest method for getting their video out to the world, which I think is a very important goal to strive for at every step.

ericxtang commented 6 years ago

@jozanza that's an interesting idea - keeping a well-known list of transcoders so any broadcaster can access it at any point. It's almost like a community board.

The challenge with this approach will be maintaining that list. Who is doing that? If a special node (like a boot node) does it, becomes a single point of failure. If different parts are stored in different peers, we are back to the "flooding scenario". We COULD write it into the DHT, but in my experience, it takes 10s of seconds for each retrieval request (I'm happy to be proven wrong here, maybe I was doing something incorrectly).

I wonder if this implementation is faster in practice. "flooding the network" with a small message is not as bad (maybe a few KB? way less expensive than a single data message, which can be in the MB range), especially if it's triggered by an on-chain event, which is less frequent than many other node message types.

Feels like the fundamental issue here is "consensus", which the blockchain gives us, but in a very expensive way. Theoretically we could write the Taddr on chain, but personally I'm not ready to make that switch without more thoughts.

jozanza commented 6 years ago

Wild thought: I don't know a lot about the Whisper protocol (shh), but it seems like the kind of thing that could help a lot here. Messages are signed and published with a ttl. Everyone on the network could subscribe to these messages. I'm thinking transcoders could publish messages via shh to signal their node ids and availability.

ericxtang commented 6 years ago

Hmm... I think Whisper is used for synchronous communication. It basically sends the message out to the entire network (and uses a "binary search" style routing to get to the right node). So if we want to use it, the transcoders would have to continuously flood the network with their availability msg. Upside - we'd automatically know if it's offline or unreachable. Downside - continuous flooding of messages.

j0sh commented 6 years ago

The proposed changes seem reasonable. First, some assumptions:

Transcoders should be "publicly available" ; that is, given a host:port pair, there won't be a NAT or firewall complicating inbound connections from broadcasters. Do we want this host:port information to be public to the network? (This segues back to @jozanza 's idea about having a public registry, which would work well enough to start with. But it may not be the best long-term solution, for operational and scalability reasons which I can elaborate on later.)
Should broadcasters also be publicly available? My first instinct is to say no. While a certain level of operational competency should be expected from transcoders (who are expected to be available continuously), I don't think we can say the same for broadcasters.
Connections to/from relays are persistent. (There probably should be a ping/keepalive for this.)

Questions and comments:

How does TranscodeSub routing work for nodes that are several hops apart? Eg, T -> R1 -> R2 -> B Keep in mind the assumption that broadcaster addresses wouldn't be public to the network (but known to the relays that the broadcaster has an open connection to). Flooding?
How does a TNodeID facilitate a direct connection from broadcaster to transcoder? Is this a DHT lookup with the actual connection info (host/port)? Maybe the TranscoderSub could include the transcoder's host/port so the broadcaster knows where to initiate the direct connection?
The transcoder doesn't have a way to associate an incoming connection to a job, which is problematic if the transcoder has multiple jobs. Perhaps include some nonce (or the JobID) in TranscodeSub to decouple this implicit state. The broadcaster could sign this nonce in order for the transcoder to verify the connection is legit, rather than waiting for the first signed segment to come through. This might be an additional message in the protocol though.
If we go through the effort of sending a nonce/JobID on connection, then perhaps we don't need the subsequent SUB request from the transcoder. This might actually be detrimental to code reuse in the actual implementation though, so the extra chattiness could be a better tradeoff.

ericxtang commented 6 years ago

@j0sh -

Connections to/from relays are persistent, although not stable (relayers might disappear for a number of reasons)
TranscodeSub routes loosely following Kademlia. R1 will rank its neighbors based on the closeness to B, forward the message to N of them, etc.
I think within libp2p's Kademlia, TNodeID is the only thing you need to search for a peer connection. But we can definitely also put T's host/port in the message to skip the search.
You are right that T cannot check the Eth identity of an incoming request. I like the nonce idea. We still have an attack vector for DDOS though (the connection has to be formed first).
- One way to solve this would be for the broadcaster to include its address on-chain during job creation - this way the transcoder can check the address of the connection request and decide to refuse if it's coming from an unknown node. But this requires a protocol change.
- Another way is for the B to relay a BIdentity request to T as soon as the job is assigned, including the nonce, the signature, and its libp2p addr. Instead of T monitoring the chain, it waits for this BIdentity to send TranscodeSub. This way, T will already know B's identity when the connection request comes in, and can decide if it wants to refuse the connection.

yondonfu commented 6 years ago

I think within libp2p's Kademlia, TNodeID is the only thing you need to search for a peer connection. But we can definitely also put T's host/port in the message to skip the search.

Perhaps if exposing node information for connections when the transcoder propagates its node info to the network with the TranscodeSub message is a problem for attack vector concerns, a transcoder could run multiple proxy relay nodes that are connected to the public network, but have private connections to the single node actually running the media server to protect it from a flood of incoming connections.

Another way is for the B to relay a BIdentity request to T as soon as the job is assigned, including the nonce, the signature, and its libp2p addr. Instead of T monitoring the chain, it waits for this BIdentity to send TranscodeSub. This way, T will already know B's identity when the connection request comes in, and can decide if it wants to refuse the connection.

Is this necessary if the transcoder already knows the broadcaster's node ID (derived from the streamID stored with the on-chain job)? After the transcoder sends TranscodeSub and the broadcaster receives it, if the broadcaster makes an outgoing connection with the transcoder, the transcoder can verify that the connection is from the nodeID for an on-chain job that it was assigned to by comparing the nodeID for the incoming connection with the nodeIDs derived from the streamIDs stored with the jobs that it is currently assigned to. The transcoder then knows that this particular connection is associated with one particular job and can mark for itself that it has an open connection for a particular job

ericxtang commented 6 years ago

@yondonfu good points. The transcoder could also encrypt its information with the broadcaster's public key in TranscodeSub, so then only the broadcaster can decrypt. But it's not bulletproof because the broadcaster could be malicious. Proxy relay nodes are probably more secure in the long run.

You are right about T already know B's nodeID from the streamID. No need for BIdentity.

j0sh commented 6 years ago

This is where I should read up more about Kademila/libp2p addressing and routing. That being said, here are a few of the operational and scalability concerns I alluded to earlier:

Perhaps if exposing node information for connections when the transcoder propagates its node info to the network with the TranscodeSub message is a problem for attack vector concerns, a transcoder could run multiple proxy relay nodes that are connected to the public network, but have private connections to the single node actually running the media server to protect it from a flood of incoming connections.

Transcoders, especially large ones, will definitely have to address concerns with methods such as those. But I think we owe it to them to mitigate that problem as much as we can. There are a number of ways a transcoder can reduce the attack surface (and increase scalability) by sending connection info within TranscodeSub :

Can randomize the ports per job to prevent reuse spam, and firewall unused ports
A DNS name can have any number of IP addresses, which gives transcoder operators options for scalability. Not sure if Kademila offers something like this. (They could also run internal proxies for this purpose, but that'd also work with Kademila addressing.)
The connection info (address/port?) doesn't even have to be public; sensitive contents within TranscodeSub could be encrypted with the broadcaster's public key, so relays (and the network at large) would not be able to sniff out transcoders.

Another way is for the B to relay a BIdentity request to T as soon as the job is assigned, including the nonce, the signature, and its libp2p addr. Instead of T monitoring the chain, it waits for this BIdentity to send TranscodeSub. This way, T will already know B's identity when the connection request comes in, and can decide if it wants to refuse the connection.

Is this necessary if the transcoder already knows the broadcaster's node ID (derived from the streamID stored with the on-chain job)? After the transcoder sends TranscodeSub and the broadcaster receives it, if the broadcaster makes an outgoing connection with the transcoder, the transcoder can verify that the connection is from the nodeID for an on-chain job that it was assigned to by comparing the nodeID for the incoming connection with the nodeIDs derived from the streamIDs stored with the jobs that it is currently assigned to. The transcoder then knows that this particular connection is associated with one particular job and can mark for itself that it has an open connection for a particular job

Agreed, I'm not sure if a BIdentity is entirely necessary if the same information can be derived from the job (eg, a broadcaster's public key). Additionally, I'm not sure if there is a guaranteed correspondence between information in an ad-hoc request such as BIdentity (coming from a relay) and an incoming direct connection (coming from the broadcaster); the direct connection would still have to be authenticated somehow, unless this is a feature that libp2p offers. Not to mention it'd be an additional message to propagate through the DHT.

That being said, the idea of the broadcaster sending a preliminary request to the transcoder (instead of the transcoder monitoring the chain) is attractive for a couple reasons:

By having the broadcaster send the initial request, this allows the broadcaster to hide its presence on the network, in the general case. Again, not knowing much about how Kademila routing works, could the broadcaster's libp2p address be randomized for each job and discarded after? This wouldn't stop long-term eavesdropping attacks that vacuum up node IP addresses, but it allows the broadcaster to avoid publishing a fixed record of presence (its libp2p address) when creating the job.
This could be the beginnings of a transcoder availability checking mechanism.

ericxtang commented 6 years ago

@j0sh broadcaster can create a new NodeID anytime, but the IP will stay the same.

Everyone - I think this has been a great discussion. I updated the proposal. Let me know what you think.

rairyx commented 6 years ago

For arbitrary network topology between B and T eg, B->R1->R2...Rn->T, can current semi kademlia routing aka. closest peer method always find B from T?
I think Transcoder ID can be published in DHT like registry, while its IP and port can be hidden from public and exposed only to publishers.

ericxtang commented 6 years ago

@rairyx I think given a large enough N (for each R to relay to its neighbors), it'll work. We'd have to test in practice to see what N is. Also, this routing scheme is exactly what Kademlia uses. The difference is in the network formation, where Kademlia tries to discover neighbors and put them into buckets based on their addresses.

Transcoder ID can definitely be published in the DHT. Have you had much success playing to the libp2p DHT? My experience hasn't been great, but it was a while ago. I'd be interested in seeing some newer examples.

j0sh commented 6 years ago

We're getting close.

TranscoderSub(T_IP, T_PORT, T_NodeID, T_Sig, B_NodeID)

Finally did some reading into libp2p last night. Might a multiaddress work as well in place of an explicit IP/port? I don't quite see the concrete benefits of multiaddr for our use case, but using it appears somewhat consistent with our other use of IPFS-inspired mechanisms. In any case:

Is the T_Sig done over T_NodeID in order to authenticate the request is coming from the correct transcoder? Does the libp2p overlay inform the receiver of the sender's node ID, or does that need to be added ourselves in the tuple?
What is B_NodeID used for? To mark the destination? Asking for similar reasons as T_NodeID above; not sure if we have to explicitly tell libp2p how to deconstruct the tuple in order to route it.
Probably need the streamID in there; there's a distinct possibility of having a broadcaster create multiple jobs, and those jobs being assigned to the same transcoder. So we need a way to distinguish jobs, given a single TranscodeSub request.

I think Transcoder ID can be published in DHT like registry

Could the transcoder ID be published on-chain alongside the transcoder pool information? Or could an ID be derived from something like the Eth/LPMS address? This would save space on-chain.

ericxtang commented 6 years ago

@j0sh I don't have a strong preference between multiaddr and IP/port, just thought it might be more "generic" to use IP/port, and have the client decide how to make the connection.

For T_NodeID, I have it in there because the libp2p Kademlia network has the ability to query for multiaddr based on NodeID. I'd like the option just in case we want to switch to it later.

I think using StreamID instead of B_NodeID is a better idea. It already contains B_NodeID, and makes the request unique. Will update the doc.

The Transcoder ID is not derivable from the Eth addr, since it's a libp2p/networking concept. We don't have any of the networking information on chain at the moment, I think we need to think about whether that's a good idea. But I don't think the Transcoder ID problem holds us back in any way - I see it as an optimization opportunity in the future.

yondonfu commented 6 years ago

For T_NodeID, I have it in there because the libp2p Kademlia network has the ability to query for multiaddr based on NodeID. I'd like the option just in case we want to switch to it later.

I thought the node ID is only necessary for peer routing i.e. finding the location of the node and getting its multiaddrs such that it can directly dial it? If the transcoder is already providing a multiaddr in the TranscodeSub message, I don't think the T_NodeID is also necessary?

I don't have a strong preference between multiaddr and IP/port, just thought it might be more "generic" to use IP/port, and have the client decide how to make the connection.

Perhaps it makes sense to use the multiaddr since in the libp2p context, it can be used to directly dial to a node. The message would become TranscodeSub(T_Multiaddr, B_NodeID, T_Sig) where T_Sig is a signature over h(T_Multiaddr, B_NodeID) using the transcoder's ETH address.

If we wanted to include the transcoder's node ID in the message for some purpose, as an additional later optimization perhaps we could add support for an additional protocol in the multiaddr format such that the end multiaddr looks something like: /ip4/127.0.0.1/tcp/63617/livepeer/QmWrFXvZr9S4iDqycyoyc2zDdrT1jg9wpdenUTdd1LTar6

I think using StreamID instead of B_NodeID is a better idea. It already contains B_NodeID, and makes the request unique. Will update the doc.

Could we just reuse the same connection if a broadcaster creates a subsequent job that is again assigned to the same transcoder?

j0sh commented 6 years ago

it might be more "generic" to use IP/port, and have the client decide how to make the connection.

Perhaps it makes sense to use the multiaddr since in the libp2p context, it can be used to directly dial to a node.

Another benefit to multiaddr here is that, if we were to add several ways to initiate a connection, we'd need some way to indicate transcoder capabilities. Eg, we add a UDP protocol in addition to TCP, but older transcoders might not support that.

Could we just reuse the same connection if a broadcaster creates a subsequent job that is again assigned to the same transcoder?

If the jobs are sequential, one after the other (and not interleaved/concurrent) and the connection is still alive... that might be okay? We could multiplex several concurrent jobs into one connection, but the segments we send would need some sort of framing to indicate which job it belongs to. Not sure if we have that yet? Having a separate connection seems to be the simplest approach.

One thing that might be missing in the overall picture: does the broadcaster send anything to the transcoder to indicate the stream it's making the connection for?

The message would become TranscodeSub(T_Multiaddr, B_NodeID, T_Sig) where T_Sig is a signature over h(T_Multiaddr, B_NodeID) using the transcoder's ETH address.

Do you mean streamID rather than B_NodeID here? Other than that, I agree.

ericxtang commented 6 years ago

Updated TranscodeSub.

@j0sh I think you can create multiple streams over the same connection, but may only be able to create a single connection for a single multiaddr.

jozanza commented 6 years ago

Question: I might be missing something, but if each LivepeerNode already has a libp2p multiaddress, what's the purpose of a NodeID?

yondonfu commented 6 years ago

@jozanza I believe the NodeID is used for peer routing to find the actual peer (ex. using Kademlia DHT based routing) and then once the peer is found you can get its additional peer info which can include its associated multiaddresses which you can use to directly dial that peer

j0sh commented 6 years ago

Just double checking my assumptions here -- 'direct connections' are straight TCP, or are we talking about a libp2p-assisted direct connection that uses a libp2p protocol?

B gets TranscoderSub, verifies T_Sig, and tries to create a direct connection with T using T_IP, T_PORT and T_NodeID.

I suspect this line may need to be updated, but in case it does not... what is T_NodeID used for here, given that the multiaddr for direct connections comes from TranscodeSub? Also, given that TranscodeSub is propagated via libp2p, is the broadcaster (TranscodeSub receiver) made aware of the sender's NodeID as part of the underlying protocol for the overlay network?

T gets the direct connection request from B, checks the NodeID of B is the NodeID in StrmID, and uses the same SUB/DATA mechanism to subscribe to the video stream.

How is the B_NodeID check done if the broadcaster is not sending a message other than simply making a TCP connection? If it's a "libp2p direct connection", does libp2p also supply the receiver with the sender's NodeID even if the multiaddr doesn't contain that information?

the NodeID is used for peer routing to find the actual peer ... once the peer is found you can get its additional peer info which can include its associated multiaddresses which you can use to directly dial that peer

This part is entirely optional, right? We don't want to expose information that could be used to attempt direct connections to broadcasters, for example. Transcoders might not want to publicly advertise their connectivity, either (only to their assigned broadcasters).

ericxtang commented 6 years ago

@j0sh updated the document according to your comment about 6

We'd probably have to implement some kind of policy for connection handling for T. For example, it can check the incoming connection's ID in the basic_notifiee.go, and drop the connection if it's not expecting it. Or maybe it prioritizes connections from broadcasters, and starts dropping non-broadcaster connections after a certain threshold.

For NodeID, we use it to relay messages (relay to the neighbor with the closes NodeID to the destination). We don't use it for node lookup now, but we could (and probably should) in the future. re - transcoders not wanting to publicly advertise connectivity, I think that makes sense, but I think in the current design, you can't really tell if a node is a transcoder from just a NodeID.

j0sh commented 6 years ago

Couple more thoughts on the implementation details.

If the transcoder goes offline, the broadcaster needs to know when and how to re-initiate the direct connection to the transcoder. We can have the broadcaster retry periodically using the original connection information. Once the connection is established, the transcoder can re-subscribe.
How long should the transcoder multiaddr be valid for? The duration of the job unless indicated otherwise? The transcoder could send new TranscodeSub requests as needed, in order to refresh the multiaddr. There are a few reasons to do that: load balancing, failover, security (eg, rotating listeners).

The problem here is if the broadcaster is unavailable to receive new TranscodeSub request. I don't think we want to impose the same uptime requirements on the broadcaster as we do for the transcoder.

The broadcaster just doesn't have any new content at the moment. Should we keep the direct connection alive? Does the network need a special indication of this, eg for the player?

ericxtang commented 6 years ago

How long should the transcoder multiaddr be valid for? The duration of the job unless indicated otherwise? The transcoder could send new TranscodeSub requests as needed, in order to refresh the multiaddr. There are a few reasons to do that: load balancing, failover, security (eg, rotating listeners).

Sending new TranscoderSub is interesting - the transcoder can use it to re-establish connection if it goes offline somehow. We can add it as an improvement.

The broadcaster just doesn't have any new content at the moment. Should we keep the direct connection alive? Does the network need a special indication of this, eg for the player?

This touches on the issue of peer/connection management for the transcoder. I think in the ideal world we keep an order of peer from most to least important. Maybe we can add that as an improvement later.

j0sh commented 6 years ago

There is no way right now to look up a job on the broadcaster given a StreamID. This makes it difficult to check the Transcodesub signature (eg, look up the transcoder address assigned to a given stream) without building a lot of scaffolding.

Since the ethclient has a GetJob function, might it be better if TranscodeSub passed in the JobID (or some opaque identifier that's cast back to the actual type by a signature-verification callback) rather than the StreamID ? Then the signature-verification callback should be able to look up jobs as-needed on-chain.

The drawback is that using the JobID is a bit at odds with the current implementation of most things in basicnet, which uses StreamIDs exclusively right now. Given other issues (https://github.com/livepeer/protocol/issues/203#issuecomment-372683397) we could be moving more towards using JobID, so this might not be too bad in the long term.

ericxtang commented 6 years ago

@j0sh I think the ability to look up a JobID from StreamID might be necessary anyways. Currently we don't have any "permanent" state in the node. But I think we'll need to add a database to store transcoder state soon. If that's the case, I can see the broadcaster storing a lookup table for JobID/StreamID. Of course, that requires some common infrastructure.

I think the changes in https://github.com/livepeer/protocol/issues/203#issuecomment-372683397 can also be addressed by keeping the lookup table.

Maybe for the sake of making progress, we pencil in the signature verification portion for the next iteration? (For example, we can use a callback mechanism for signature verification, and use an identity function in the go-livepeer code for now)

j0sh commented 6 years ago

Sure, we can defer signature checking.

For the future, consider: all the job information is available right now on with an on-chain lookup via JobID, including the StreamID. So rather than maintaining a reverse lookup table mapping StreamID <-> JobID, we could convert most fields in basicnet to use bigints / JobIDs instead.

Within basicnet, there is relatively little that depends on knowing the structure of StreamID (mostly to look up the broadcaster nodeID); we should be able to work around those cases, and replace StreamID with an opaque "JobID" bigint that is passed around for the purposes of providing context to the basicnet API consumer (whether Livepeer or some other off-chain system).

yondonfu commented 6 years ago

Although I get that using jobIDs in the networking protocol would make certain implementation related tasks easier, it does feel like it is tightly coupling the networking protocol with the smart contract protocol when the networking protocol should be able to be standalone BUT able to integrate with the smart contract protocol if needed. My immediate thought is that by switching from streamIDs to jobIDs in the networking protocol, you lose the ability to request content with JUST a content identifier since the streamID encodes nodeID info that can be used to route to the node serving the content absent other information while the jobID is not able to support this (unless you were thinking of something else besides just using a regular big integer). Additionally I think semantically there should be a difference between "streams" and "jobs" - you should be able to have a stream, but not necessarily have a job.

j0sh commented 6 years ago

After the direct connection to the transcoder succeeds and the subscribe is established, should the broadcaster stop transmitting segments to the boot/relay node as well? If we don't, this doubles bandwidth (which may be scarce on the broadcaster side).
What should the broadcaster do if the transcoder is actually inaccessible via a direct connection? Maybe we should send a TranscodeSubFailed message via the relay node, and the transcoder can either send the SUB via the old method, or another TranscodeSub with updated connection parameters. Or leave the question open as we re-think networking from the top down.

j0sh commented 6 years ago

@ericxtang Here's a thought on how to handle the situation where a direct connection to the transcoder cannot be made, such as if the transcoder is NAT'd.

Consider Step 7, as currently specified:

T gets the direct connection request from B, checks the NodeID of B is the NodeID in StrmID, and uses the same SUB/DATA mechanism to subscribe to the video stream.

In order for this step to work, we need to NACK the TranscodeSub in case the broadcaster can't initiate a direct connection, so the transcoder knows to send the SUB back through the relay network. This is another message in the basicnet protocol, another set of round-trips, and interacts strangely with broadcaster unavailability on several levels.

In order to simplify the entire setup, have the TranscodeSub message actually set up the subscription along with the direct connection. No need for the second SUB step. If the direct connection can't be established, the broadcaster can fall back to sending segments through the relay network. The transcoder should be able to handle receiving segments from either the relay or broadcaster, and can send TranscodeResponseMsgs back down whichever channel is appropriate.

ericxtang commented 6 years ago

I understand we would use NACK if we wanted to have deterministic behavior - but I view TranscodeSub as an opportunistic optimization. One way to implement it would be as follow:

The transcoder simply sends TranscodeSub, waits for some time, and sends a SUB to the network? If the broadcaster makes the connection within this time frame, SUB will be automatically sent to the broadcaster because it will be in the transcoder's peerstore, and will have the closest distance to the targetPid. Otherwise, SUB will be sent out to the network and relayed.

I think combining TranscodeSub and SUB will probably require more work, but agree it's more efficient. Up to you!

j0sh commented 6 years ago

The transcoder simply sends TranscodeSub, waits for some time, and sends a SUB to the network? If the broadcaster makes the connection within this time frame, SUB will be automatically sent to the broadcaster because it will be in the transcoder's peerstore, and will have the closest distance to the targetPid. Otherwise, SUB will be sent out to the network and relayed.

This is where we run into issues with broadcaster availability. Until the transcoder gets an ack from the broadcaster (either via a direct connection or a data segment), the transcoder actually has to keep re-submitting TranscodeSub (or SUB) periodically to the network. Just doing either once is not enough. If we have to transmit periodic messages, we should incorporate the information for direct connections. Otherwise, we miss an easy opportunity to bundle additional resiliency into the protocol semantics.

I think combining TranscodeSub and SUB will probably require more work

It ended up making the code a bit shorter thanks to increased sharing, but I compensated by adding more tests. Win-win.

livepeer / go-livepeer-basicnet

Basicnet Protocol Documentation #21

Super Rough Version

Change Proposal