livepeer / go-livepeer

Official Go implementation of the Livepeer protocol
http://livepeer.org
MIT License
546 stars 171 forks source link

Capability Discovery #1519

Closed j0sh closed 4 years ago

j0sh commented 4 years ago

Abstract

A mechanism for capability discovery is proposed. Capability discovery eases compatibility and discovery concerns related to operating in a heterogeneous network where nodes support a mix of features. The technical mechanism is as follows:

Motivation

When a new feature is added that requires support on both sides of the B-O network boundary [1] (recent examples: MP4 support, non-integer frame rates, durations), then consideration needs to be given towards compatibility. A robust compatibility mechanism is important to minimize the impact of changes, to maximize the reach of the network, and to provide a framework for achieving such while minimizing engineering effort. The criteria for a compatibility system are as follows:

  1. How to identify nodes that have upgraded to support a new feature
  2. How to identify nodes that might not support the new feature, whether due to being non-upgraded or having the feature disabled
  3. How to interoperate with non-upgraded nodes when the new feature is not needed

Capability discovery is proposed as a mechanism for this.

[1] The O-T network boundary also has similar issues around compatibility, since transcoders cannot be expected to be upgraded in tandem with orchestrators. The same capability matching mechanism should be re-used during transcoder selection by orchestrators.

Proposed Solution

Two parts: feature bit-string and capability constraints.

The capabilities of the orchestrator are advertised as a bitstring. Each bit in the string corresponds to a certain capability (or feature). Capabilities using this bitstring mechanism are binary: either they are enabled, or not.

On discovery, the orchestrator sends down its capability bitstring within the OrchestratorInfo response. The broadcaster constructs its own (reduced) bitstring corresponding to the transcoding requirements for that particular job. Capability checking of a particular orchestrator corresponds to checking that the AND between the two bitstrings is equivalent to the broadcaster's own capability requirements:

isOrchestratorCapable := (broadcaster.bitstring AND orchestrator.bitstring) == broadcaster.bitstring

The orchestrator populates its bitstring at startup. Some capabilities might be explicitly toggled via config, others might be auto-detected (eg, codec support if using particular GPUs), and some others might simply be always-on as compatibility markers. The broadcaster generates its bitstring based on job requirements, perhaps during initialization of the orchestrator discovery procedure.

Some features are not binary, so can not be represented as a bitstring. Limits or ranges might be common. For example, perhaps the orchestrator opts to constrain the video resolution, duration or bitrate it's willing to process. Currently, there is no requirement to handle such constraints, but it is certain to occur eventually. For such cases, a separate structure can be used, with each constraint added as a field, and a comparison defined for each particular constraint. See Capability Constraints for additional details.

For additional details on why the distinction between feature bitstrings and capability constraints is necessary, see Alternative to Bitstring: Constraint-Only Matching.

Compatibility Criteria

Feature bit-strings address the criteria for compatibility:

  1. How to select nodes that have upgraded to support the new feature

Orchestrators that have not upgraded to support a certain feature will have the corresponding bit-indices marked as zero. Non-upgraded nodes will be excluded as a result of this.

  1. How to identify nodes that might not support the new feature, eg disabling a feature

Orchestrators that have certain features disabled will have the corresponding bit-indices marked as zero.

  1. How to interoperate with older nodes when the new feature is not needed

When the job only uses older features, then newer features are not toggled in the broadacster's reduced bitstring of capability requirements. Hence, the bitstring for non-upgraded orchestrators will continue to match the broadcaster's requirements.

Implementation Considerations : Bitstring Construction, Matching and Feature Mapping

Each new feature or capability is assigned a permanent bit-index. This bit-index assignment must be shared by all nodes that wish to interoperate. For example, in golang, a fixed set of enumerations could be used:

// assume these are the features that are enabled, with their bit-indexes
const (
  Capability_MPEGTS = 0
  Capability_MP4 = 1
  // any others formats can be listed here
  Capability_FractionalFramerates = 5
  // any other features can be listed here
  Capability_H264Profile_Baseline = 10
  Capability_H264Profile_Main = 1
  Capability_H264Profile_High = 12
  Capability_H264Profile_ConstrainedHigh = 13
  // and so forth
  Capability_HEVC_Decoding = 20
)

For simplicity of representation and processing, it may be suitable to represent the bitstring as a list of uint64 values. Here is how to construct and match on the bitstring using a list-of-uint64 representation.

// Assume all these features are enabled
features := [Capability_MPEGTS, Capability_MP4, Capability_FractionalFramerates, Capability_H264Profile_Baseline, Capability_HEVC_Decoding]
//  bit string construction
bitstring = make([]uint64, ceil(len(features)/64))
for i := range features {
  bit_index := features[i]
  int_index := floor(bit_index / 64)
  int_bit_index := bit_index % 64
  bitstring[int_index] ||=  (1 << int_bit_index)
}

// broadcaster / orchestrator bit string matching
// assume we have a "reduced bitstring" with job requirements
bcast_bitstring = ... // example of how to map this to job below
if len(bcast_bitstring) > len(orch_bitstring) return false
for i := range bcast_bitstring {
  if bcast_bitstring[i] != orch_bitstring[i] return false 
}
return true

Here is an example of how a broadacster's requirements bitstring could be constructed based on the job parameters. This may seem tedious, but the construction is straightforward. One day we could build better abstractions for the construction, if they present themselves. For now, something like the below will likely suffice:

// broadcaster mapping of feature requirements to bitstring indices
has_mpegts := false
has_mp4 := false
has_h264_baseline := false
// ... and so on for the rest of the features.
for i := range job.outputs {
  has_mpegts ||= job.outputs[i].Format == Format_MPEGTS
  has_mp4 ||= job.outputs[i].Format == Format_MP4
  has_h264_baseline ||= job.outputs[i].H264Profile == Profile_H264Baseline
}
if (has_mpegts) features = append(features, Capability_MPEGTS)
// and so forth
// now invoke the bitsting construction routine described earlier, given `feature`

Implementation Considerations: Capability Constraints

Some features are not binary, so can not be as easily represented in a simple bitstring. Limits or ranges might be common. For example, perhaps the orchestrator opts to constrain the video resolution, duration or bitrate it's willing to process. Currently, there is no requirement to handle such constraints, but something is likely to occur eventually. For completeness, here is a sketch of how such constraint-matching could work.

message CapabilityConstraints {
  MaxWidth: 1920
  MaxHeight: 1080
  MaxDuration: 600000
}

Additional constraints will have to be explicitly checked for each field. It might be enough to also represent the job's constraints within the same type of structure, and compare the two, eg:

orchConstraints := orchestratorInfo.CapabilityConstraints
bcastConstraints := ComputeBroadcasterConstraints(segTranscodingMetadata)
useOrchestrator := orchConstraints.IsAcceptable(bcastConstraints)

func ComputeBroadcasterConstraints(md *SegTranscodingMetadata) (*CapabilityConstraints) {
  c := &AdditionalConstraints{}

  // Populate source-based fields as needed
  c.Duration = md.Duration
  // ... other fields

  // Populate output-based fields as needed
  for _, profile := md.Profiles {
      c.MaxWidth = max(profile.Width, c.MaxWidth)
      // For the sake of example, assume we have a "minimum" constraint. Populate zero value first.
      c.MinWidth = min(profile.Width, c.MinWidth)
      // ... other fields
  }
}

func (orchConstraints *CapabilityConstraints) IsAcceptable(bcastConstraints *CapabilityConstraints) bool {
  if orchConstraints.MaxWidth < bcastConstraints.MaxWidth return false
  // For the sake of an example, assume there is a minimum-type constraint
  if orchConstraints.MinResolution > bcastConstraints.MinResolution return false
  // ... check other fields ...
  return true
}

We may want to also signal support for each additional constraint within the bitstring itself. This would primarily be useful for orchestrator matching on broadcasters, in order to detect whether the broadcaster was expecting compatibility with a certain field the orchestrator is unaware of. Unclear whether such signaling would be "always-on", or enabled as needed.

For the initial implemementation, stub capability constraints structs and functions can be defined. Subsequent work on additional features can fill these out as needed.

Implementation Considerations : Discovery

The signature for the GetOrchestrators function will need to be updated to take a set of requirements against which to filter orchestrator capabilities.

Currently, capability discovery will be non-interactive: the broadcaster will not transmit its own requirements during the GetOrchestrators call.

The existing predicate check predicate check may be folded into this new mechanism.

Implementation Considerations: Compatibility with Non-Capability Enabled Orchestrators

Nodes will not be upgraded simultaneously to support capability discovery, so there still needs to be some consideration put towards compatibility with those older nodes.

When the broadcaster's bitstring is generated for a given job, it can be checked whether it is exclusively comprised of "legacy" features. If the job fits within the legacy feature-set and capability information for the orchestrator is missing, then the orchestrator will still be used, provided it passes the other discovery-stage filters.

The existing mechanisms for backwards compatibility will continue to work as needed, such as attempting a MP4 transcode with a mpegts-only orchestrator.

Additional Context

Attacks

An orchestrator could set an arbitrarily long bitstring to all 1's in order to obtain as much work as possible.

We will not attempt to handle this right now, and take the orchestrator's word for what they do support. Capability discovery is not a substitute for proper verification. There may be cryptographic mechanisms to guard against this type of attack, but that is outside the scope of this initial spec.

On a related note, the lack of message constraints within Protocol Buffers is a bit of a concern. For example, we cannot reject a list more than 100 elements long during deserialization, or excessively large byte buffers. This may lead to undesirable memory or bandwidth usage. Not a new problem with capability discovery, but these types of attacks are worth noting.

The orchestrator is also incentivized to be honest about its supported constraints to ensure the best possible quality-of-service for its users. Being careless about constraints will only lead to poor results, and less work for the orchestrator over the long term. Additionally, the non-interactive aspect of discovery allows for some social pressure around obvious outliers.

Pricing Menu

Capability discovery has often been discussed alongside a "price menu". While there is some overlap with the ideas behind capability signaling, this proposal does not attempt to specify a granular pricing mechanism.

Orchestrator Rejection of Broadcasters

This approach addresses compatibility concerns from the broadcaster side - eg, how to select matching orchestrators. Does not address compatibility from the orchestrator side - eg, around using (or rejecting) broadcasters based on their capabilities. This may become necessary eventually, but is not currently in scope.

In general, an orchestrator should still reject work for features it does not support to the extent possible [1]. However, this rejection might come at a later point, such as LPMS erroring out. If applicable, the broadcaster can also take steps to trigger early failures, such as setting a deprecated field to an invalid value. However, with capabilty discovery, the onus is on the broadcaster to select appropriate orchestrators.

[1] If the broadcaster also includes its capabilities during segment submission, the orchestrator can perform the same capability check that the broadcaster does during discovery.

Orchestrator Mandatory Capabilities

At some point, the orchestrator may need certain "mandatory capabilities" present on the broadcaster. The absence of a mandatory capability would indicate the broadcaster isn't sufficiently up-to-date for the orchestrator. An example of a mandatory capability might be a change in PM handling that needs to be mirrored on both B / O.

An orchestrator signaling a mandatory capability is essentially a hard break with older B versions, which would be good to avoid as much as possible. Although we don't need mandatory capabilities right now, it might be good to incorporate sooner rather than later, in order to ensure forward-compatibility: older broadcasters can error out during discovery if it receives an unsupported mandatory capability from a newer orchestrator, rather than deferring the failure to segment submission time.

For robustness, the orchestrator should also check the broadcaster's own capabilities if there is anything mandatory, unless there is a way to fail out early at segment submission time.

The check for mandatory capabilities could resemble this:

(broadcaster.bitstring AND orchestrator.bitstring) == (broadcaster.bitstring OR orchestrator.mandatories)

DB Discovery

Would be good to add this in at some point as a shortcut during the discovery process. Maybe not in the initial implementation.

Standalone T

This capability discovery mechanism should be ported to standalone T, where the non-interactivity works well for orchestrators to select a T for the job. However, standalone T might not be in the initial implementation. The transcoder's capabilities can be advertised within its RegisterRequest.

Alternative Approaches

Alternative to Bitstring: Constraint-Only Matching

Rather than have a "two-step" matching process (one matching the bitstring, and another running through the constraints) we could simply only run through the constraints, and have each binary capability as a named field in the constraint message. This is a bit problematic for a few reasons:

Alternative to Bitstring + Constraints: Version Ratchet

One simple way to upgrade is via a version ratchet, where the orchestrator advertises a network version number (O.N), and the broadcaster only works with orchestrators that are greater than or equal to its own network version (B.N ; B.N >= O.N ). https://github.com/livepeer/go-livepeer/pull/1433 .

However, the coarseness of version ratchets is a problem. Criteria # 3 is not met : as soon as a node upgrades its network version, it is cut off from all older nodes (hence, 'ratchet'), shrinking the effective size of the network. Likewise, criteria #2 is not met either, since version ratchets do not allow for signaling whether a certain feature may be available.

Alternative to Bitstring: Bloom Filter

Alternative: Rather than a bitstring, a mechanism such as Bloom filters can be used. This might only be necessary if the bitstring becomes extremely large. Much of the constraints would have to remain, such as the indices that a certain feature hashes into. Feature matching also becomes linear on the number of features in use, since each feature needs to be looked up. Non-binary features would still have to be handled separately.

Alternative : Interactive Capability Discovery Protocol

The broadcaster can transmit its capability requirements, and the orchestrator can acknowledge whether it is able to satisfy the request. This may be useful one day for orchestrator routing, eg to direct work towards specific nodes that have hardware qualified for a certain job. However, for now, we'll stay with a non-interactive protocol for the following reasons:

yondonfu commented 4 years ago

This is great!

We may want to also signal support for each additional constraint within the bitstring itself.

Makes sense. So we could have Capability_ResolutionConstraint and Capability_DurationConstraint feature bits and if they are set to 1 nodes would lookup the corresponding fields in the CapabilityConstraints message?

It may be difficult for non-upgraded orchestrators to distinguish between implicit behaviors, and "new" capabilities it is unaware of. One example is the mpegts-to-mp4 transition; non-upgraded orchestrators would process everything as mpegts, in spite of there being a new field indicating a MP4 preference, due to the O being unaware of the field.

Could this be solved by Os rejecting jobs if there are any unknown required features in B's bitstring? In the mpegts-to-mp4 transition case, B would flip the Capability_MP4 bit in its bitstring. When O receives the bitstring it can first check if any features are required that it is not aware of - these features bits would be set to 1, but O would not be aware of a corresponding feature in its list of bit indices. If there any unknown required features it can reject the job. If all required features are known it can then compare B's bitstring against its own.

darkdarkdragon commented 4 years ago

Looks solid!

j0sh commented 4 years ago

So we could have Capability_ResolutionConstraint and Capability_DurationConstraint feature bits and if they are set to 1 nodes would lookup the corresponding fields in the CapabilityConstraints message?

Yep, that's the idea.

Could this be solved by Os rejecting jobs if there are any unknown required features in B's bitstring?

Indeed - and I think the same capability check should be able to handle this, if it's also run on the orchestrator as the session starts. That is, the following should suffice:

(broadcaster.bitstring AND orchestrator.bitstring) == broadcaster.bitstring
j0sh commented 4 years ago

Another potential issue to consider on the O side is the presence of certain "mandatory capabilities". Added a section to the writeup for this above under "Orchestrator Mandatory Capabilities."

Also added more notes about backwards compatibility with non-upgraded orchestrators. "Implementation Considerations: Compatibility with Non-Capability Enabled Orchestrators"