livepeer / go-livepeer

Official Go implementation of the Livepeer protocol
http://livepeer.org
MIT License
546 stars 171 forks source link

Capacity Cap for B #706

Closed eladmallel closed 5 years ago

eladmallel commented 5 years ago

Is your feature request related to a problem? Please describe. To improve customer experience, B should respond with a clear error message whenever it's at its max capacity of concurrent streams and a new RTMP stream comes in.

Describe the solution you'd like The user/customer should receive clear feedback from B that it is now at max capacity and therefore cannot handle new streams.

We probably want to add yet another CLI param, such as maxConcurrentStreams of type int. B should reject any incoming stream once its concurrent stream count reaches maxConcurrentStreams.

Not sure yet how to best achieve this goal?

Describe alternatives you've considered Discussed above.

Additional context None.

eladmallel commented 5 years ago

@j0sh @ericxtang @darkdarkdragon can you please share your thoughts on how we should approach this?

j0sh commented 5 years ago

Does RTMP support this kind of response?

Not really, the best we can do here is disconnect the stream. We might be able to add a special code to the connect response prior to disconnect, but that requires changes to LPMS and probably joy4 (the golang RTMP implementation), in addition to any specialized handling on the client (customer) side. Most RTMP application hooks do not expose this type of detail, so it's probably more trouble than it's worth.

Should we rely on webhooks for this kind of response?

We should certainly be able to stop the stream based on the webhook response, but I'm inclined to also add capacity caps as a separate node-level feature. Enforcing capacity via webhook means the endpoint(s) need to have a synchronized view of the broadcaster's internal state, which is error-prone.

Any other approaches?

This is pretty easy to implement. However, for M2, I'd say O-level caps are more important since this will allow the transcoding network to self-balance.

From experience with other RTMP providers, most customers will probably take a non-explicit approach to load balancing their B nodes:

Of course, a capacity cap is still useful here, if only to preserve existing QoS at the upper limit and/or alert some monitoring system.

eladmallel commented 5 years ago

I'd say O-level caps are more important

I agree that's important, and we have a separate issue for that as part of M2.

This is pretty easy to implement.

I'd love to understand more specifically how you see this implemented? What I had in mind was to introduce a new webhook message that lets the user know why the stream was rejected. A bit more in detail:

Does this sound good, or did you have a different approach in mind?

cc @darkdarkdragon

darkdarkdragon commented 5 years ago

@j0sh

Enforcing capacity via webhook means the endpoint(s) need to have a synchronized view of the broadcaster's internal state

Can you elaborate on that?

darkdarkdragon commented 5 years ago

@eladmallel

  • We could name this message something like StreamRejectedDueToMaxCapacity
  • It should contain the host the rejected the stream, and the user-defined ManifestID, so the user can have sufficient information to decide how to react
  • B would invoke this webhook immediately after rejecting an RTMP stream

Sounds good for me

j0sh commented 5 years ago

@eladmallel I may have mis-interpreted this part in the writeup: "Should we rely on webhooks for this kind of response?"

Specifically, this sounded like we would be outsourcing the logic for tracking capacity and rejecting streams beyond the max to the endpoint receiving the stream auth webhook (running outside the node). This would be problematic.

If we're just talking about an event notification that the capacity cap has been reached -- then all this sounds fine. I see that it's also mentioned within the issue for system-wide instrumentation, https://github.com/livepeer/go-livepeer/issues/671 so perhaps the specific approach to notifications (webhooks, existing metrics, collectd, something else) could be decided as part of addressing the instrumentation feature.

It should contain the host the rejected the stream

Note this will probably require changes deeper down, since I don't think connection level information is exposed to the goclient. But having such info available would generally be useful, so I'm in favor of making this change at eventually.

@darkdarkdragon Happy to elaborate on the problematic aspects if it's still relevant, but didn't want to get sidetracked over my own misunderstanding.

I'd love to understand more specifically how you see this implemented?

// Check this within createRTMPStreamIDHandler and/or gotRTMPStreamIDHandler
func (s *LivepeerServer) isAtCapacity (url *url.URL) bool {
    s.connectionLock.RLock()
    defer s.connectionLock.RUnlock()
    if len(s.rtmpConnections) > MaxBroadcastStreams {
            go sendAtCapacityNotif(url) 
            return true
     }
    return false
}
eladmallel commented 5 years ago

@j0sh I think you now understand my intention well. When I was thinking to use webhooks it was with the purpose of helping the user/customer of B be aware that:

  1. B rejected a stream just now for a good and specific reason
  2. You should know exactly which B so you can potentially direct traffic to other Bs

I think that your pseudocode is in that same line of thinking.

Are we then good with relying on webhooks to provide users of B with this useful information? @j0sh @darkdarkdragon

j0sh commented 5 years ago

Are we then good with relying on webhooks to provide users of B with this useful information?

This feels quite close to https://github.com/livepeer/go-livepeer/issues/704 -- the "B at capacity" notification really is just another event, so let's take the discussion of the actual mechanism there. Also related is https://github.com/livepeer/go-livepeer/issues/671