Requirements

Below is a list of possible requirements for namespaces. Those that are checked off have been fully accepted. Unchecked requirements are those that are considered.

[X] All system resources belong to a namespace
- [X] removing a namespace removes all member resources
[X] DNS compatible naming across the board.
[ ] Applications can use namespace for portability. For example, run two web services in different namespaces.
[ ] Namespace should be able to reference each other across namespaces.
- [ ] ACL hooks can intercept references before they are connected
- [ ] A simple system for resolving cross-namespace references to support portability.
  Considerations

We'll need to evaluate this proposal with consideration from other teams. The following quorum is proposed before proceeding:

[ ] Works well with compose plans (cc @aanand)
[ ] Works well with libnetwork (cc @mrjana)
Resources

While we have opened up the discussion of namespaces, we need to discuss naming in general. Providing an effect of feel, the following description is going to pretend that namespaces already exist. To open this discussion up, we must understand the Kinds resources:

Cluster: A cluster is controlled by a set of managers. Most resources will be scoped in a cluster. Under the current model, we define this as a quorum set.
Namespace: A cluster is divided into several namespaces.
Node: A node resides in a cluster. From a user perspective, there isn't much access other than reporting their existence. We may want to route a user to a node for certain requests. We may want to hook the node into the DNS system.
Job: A job belongs to a namespace within the cluster. A job may have multiple tasks. The job itself may have a service endpoint associated with it, accessible over DNS, such as with a service job.
Task: A task belongs to a job and a node, when assigned.
Network: A network belongs to a namespace.
Volume: A volume belongs to a namespace.
Rules about Naming

All resources in the cluster system use the same naming conventions.

All names should be compatible DNS subdomains, compliant with RFC1035. This allows any resource to be expressed over DNS. It also ensures that we have a well-known, restricted and reliable character space, compatible with existing tools.

For reference, names must comply with the following grammar:

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Each label must be less than 64 characters and the total length must be less than 256.

Names are case-insensitive, but stored and reported in lowercase, by convention.

Tools interacting with names should support conversions too and from punycode. This can be supported via golang.org/x/net/idna.

Structure

For each kind of resource, the name must be unique in the name space. This has the excellent property that all names are unique within the cluster. This means that by default, we have a way to reference every other thing.

Resource	Component	Structure	Examples
Cluster	`<cluster>`	`<cluster>`	`local`, `cluster0`
Namespace	`<namespace>`	`<namespace>.<cluster>`	`production.cluster0`, `development.local`, `xn--7o8h` (🐳), `system`
Node	`<node>`	`<node>.<cluster>`	`node0.local`
Job	`<job>`	`<job>.<namespace>.<cluster>`	`job0.production.cluster0`
Task	`<task>`	`<task>.<job>.<namespace>.<cluster>`	`task0.job0.production.cluster0`
Volume	`<volume>`	`<volume>.<namespace>.<cluster>`	`postgrs.production.cluster0`
Network	`<network>`	`<network>.<namespace>.<cluster>`	`frontend.production.cluster0`

At the base, we have the <cluster>. The cluster should refer to a specific cluster and can be named by configuration. Users should all share a common configuration but it is not necessary to interoperate.

While names are generated from structure, a resource name may have one or more labels, so they cannot be parsed to infer the source structure. For example, a node may be named a.b. When qualified, it may be a.b.default.local. If we don't know this is a node name, we may try to infer that based on structure. It is impossible to tell whether this is a resource named a on node b or a node named a.b.

Namespaces

A namespace is an area where resources can reference each other without qualification.

Every operation has a default namespace from which it is conducted. Any objects created in that context become a member of that namespace.

By default, we will have the following namespaces:

Namespace	Description
`default`	Default namespace for all resources
`system`	System namespace for cluster jobs

By default, all resources are created under default, unless the user modifies their configuration. The system namespace is used to run cluster tasks, such as plugins and data distribution plains. Resources in the system namespace are only shown in a special mode.

References

For most service declarations, we reference resources by a name. Typically, this name is evaluated within a namespace, as described above. To allow access to objects in disparate namespaces, we define a searchspace as part of an operation context. When referencing another object, the reference only needs to be long enough to resolve in the common parent. Two objects in the same cluster but different namespace only need to include the namespace in the reference but not the cluster name.

A searchspace consists of one or more namespaces, in precedence order. If a resource is not resolved with an unqualified name, each available namespace is tried until a match is found.

This can extend to involve resource sharing between two users. Let's say two developers are developing an application in their own namespaces, lucy and steve.

Let's say we have an identical service definition myapp which can be run independently:

service:
  myapp:
    instances: 4
    requires:
      redis # leave this syntax for another discussion!
    container:
      #...

For Lucy, the fully qualifed service name is myapp.lucy.local and Steve has myapp.steve.local. However, when running the service, the requirement of redis is not fulfilled. It is absent from the definition. Running the service fails. Fortunately, the operations team has made a development instance available at redis.development.cluster0. By default, neither Lucy or Steve cannot see this resource.

A few things can happen here to resolve the issue. They can both edit the configuration file to add .development to the redis reference. While this does work, it now makes the definition non-portable.

A better resolution is to have both developers add development to the searchspace for the operation context. For steve, the unqualified name would be expanded to the following fully qualified names:

redis.steve.local
redis.development.cluster0

Lucy does the same gets the following qualified names:

redis.lucy.local
redis.development.cluster0

Note that both developers did the same thing and got the same result but have different application environments.

With this, we get a very clear order in which resources are resolved. Each user can set their default namespace and searchspace and control the order in which resources are resolved. Once this is setup correctly, only unqualified names will have to be used in practice for most API operations.

The main complexity here is that all names from user input need to resolved at API request time, associating the resolution with an operation context. Subsequintly, names get written out from user input during the API call, to capture the current searchspace.

Clusters

We slightly glossed over a point above. Where did cluster0 come from? This is simply the domain name of the cluster. In the example above, both developers have a cluster on their machines, known as local. This just has to be one or more endpoints that are available for cluster submission.

Just as in searchspace, we can define a set of clusters that one might want to use from an environment. These clusters combine with the search space to create names. Let's say we have the following list of cluster domains:

local
cluster0
cluster1

We can combine this using a cross product with our searchspace ([local, development]) to get all of the possible references for a resource redis from the point of view of the user:

redis.steve.local
redis.development.local
redis.development.cluster0
redis.development.cluster1

We Let's say that Steve needs help with his application. Lucy tries to reference it with myapp.steve.local but that won't work, since .local is different between the two machines. To deal with this, we can define clusters with names. A possible configuration on Lucy's machine might be the following:

<steves ip> steves-mbp

Now, she can reference his app with myapp.steve.steves-mbp or just myapp.steve if she adds "steve" to the search space.

Access Control

Namespaces provide a tool for access control. To build this framework, we say that every operation has a context with a namespace. Under normal operation, all creations, updates and deletions happen within the context's namespace.

Access control operations simply use this framework to operate within. We can define which namespaces can access other namespaces.

TODO: Work out some examples here. This actually works well, but we need examples.

Alternative Models

Some other possible models under consideration:

Similar to the above but resources cannot reference between namespaces. Slightly inflexible in large teams that want to partition a cluster arbitrarily.
Slash-based model. Not DNS compatible, but somewhat useful against current docker projects.
Vanity

Naming is typically done out of vanity. While this specification is fairly restrictive in naming, since we intend to use naming as an organizational tool, we may find it necessary to introduce the concept of a vanity name.

Put whatever you like in this name.

Road Map

[ ] Define first-class Namespace object
- Provides definition of the namespace
[ ] Lock down naming
[ ] Define context object for all API requests
- namespace for owning namespace of operation
- searchspace for reference context

@mikegoezler @aluzzardi @amitshukla @icecrime

<task>.<job>.<namespace>.<cluster>

Since tasks are given unique IDs, is this hierarchy necessary for them?

xn--7o8h (🐳) Default namespace for all resources xn--7q8h (👹) System namespace for cluster jobs

Can we please not do this? I'm terrified that I would have to type one of these at some point, or explain to a user how to. And I could imagine all kinds of issues with fonts that don't have these glyphs, broken terminals, text to speech for blind users, and so on.

We can get away with emojis for these namespaces since they won't often by typed out by users.

That doesn't inspire confidence.

References

I like the ideas on references in general. One thing I'm wondering about is the UX aspect of setting search paths. It seems users would have to maintain the equivalent of a /etc/hosts file, at least for any sufficiently advanced cluster setup where they don't want to specify fully qualified references.

Another potential gotcha is that by using search paths, something like redis can change its meaning unexpectedly. For example, if redis.steve.local is deleted, redis might start to refer to redis.development.local. This could happen without warning and it might be frustrating to debug why redis now refers to something else. A similar example would be taking a service from development and deploying it into production, and having things break because a relative reference is interpreted differently due to different search paths.

I don't know the right tradeoff for either of these. At one extreme, we could force fully qualified references everywhere. This would be less convenient, but avoid ambiguity and setup overhead. The other approach is to give users a lot of local control over how references are interpreted.

I think we can fine tune this quite a bit by coming up with best practices. One example that comes to mind is that DNS resolvers support search paths, but in practice they are only rarely used - and this is good, because it means people can exchange URLs of cat videos and they will work on any system. By analogy with this, we might support searchspaces but treat them as a power user feature that doesn't often need to be used, and encourage people to use fully qualified references unless there's a good reason not to.

To deal with this, we can define clusters with names. A possible configuration on Lucy's machine might be the following:

Earlier I made the analogy with /etc/hosts files because users would be defining their own search paths for local resolution. Here users would also be defining local aliases. So it really is exactly like a /etc/hosts file.

Since this is heavily inspired by DNS, I wonder if we can build on top of DNS in some way. Let me throw out the following (probably flawed) approach just for the sake of an example:

Given a reference in the format a.b.c.d...:

Look up the full reference as a DNS name. If a result is found, this a cluster reference.
Otherwise, look up b.c.d.... If this exists, then b.c.d... is the cluster, and a is a node or namespace.
Otherwise, look up c.d... If this exists, then c.d... is the cluster, b is the namespace, and a is some resource within that namespace.

I don't know if this particular approach makes any sense, but the nice thing about it is that normal DNS tools and configuration applies. If you want to add an alias for steves-mbp, you add it to your hosts file like any other DNS alias. And if you want to control search paths, you do it the same way you would for DNS in general. Admins tend to be pretty familiar with DNS, so I feel like they would be more comfortable with this than custom tooling that mimics DNS. So I wonder if we can find a way to build this on top of DNS (or maybe that's what you were already suggesting when you said "This is simply the domain name of the cluster").

Access Control

This isn't really well-defined enough to provide much comment on, but intuitively, a namespace seems like a reasonable access control boundary.

Vanity

I don't really understand what a vanity name would be in this context.

Cluster: A cluster is controlled by a set of managers. Most resources will be scoped in a cluster.

What's the relationship between a cluster, a datacenter and a region? Are clusters part of the same quorum?

Why are clusters explicitly part of the naming?

How are those namespaces intended to be used?

I see in the examples development (which is how namespaces are typically used), but what about compose's projects (app-like)? Will multiple compose projects fit into the same namespace or are we going to have a namespace per project? (/cc @aanand @bfirsh)

A better resolution is to have both developers add development to the searchspace for the operation context.

How does this practically work, from the operator's point of view? Will she need to add a --search-space to every swarmctl command? What happens when another operator tries to re-deploy the same file? Does he need to know the search space the previous operator used?

Since tasks are given unique IDs, is this hierarchy necessary for them?

Tasks need names.

Another potential gotcha is that by using search paths, something like redis can change its meaning unexpectedly. For example, if redis.steve.local is deleted, redis might start to refer to redis.development.local. This could happen without warning and it might be frustrating to debug why redis now refers to something else. A similar example would be taking a service from development and deploying it into production, and having things break because a relative reference is interpreted differently due to different search paths.

I don't know the right tradeoff for either of these. At one extreme, we could force fully qualified references everywhere. This would be less convenient, but avoid ambiguity and setup overhead. The other approach is to give users a lot of local control over how references are interpreted.

This is a feature. The searchspace is a tool to support portability of references. By ordering the search space, users can control interpretation, all the way from fallback to strict resolution. For example, if you want your users to fallback to redis.development, place that in their searchspace. If you don't want them to, don't place that in their searchspace. What we don't want is a mess of ways to combine these. We do this by defining how references are resolved.

I think we can fine tune this quite a bit by coming up with best practices. One example that comes to mind is that DNS resolvers support search paths, but in practice they are only rarely used - and this is good, because it means people can exchange URLs of cat videos and they will work on any system. By analogy with this, we might support searchspaces but treat them as a power user feature that doesn't often need to be used, and encourage people to use fully qualified references unless there's a good reason not to.

Typically, we don't use DNS search space when interacting with the Internet. Whenever there is a large common space, the search doesn't make much sense. In the before times, DNS was used for site local references. For example, mail in a search with foo.com would be the mail.foo.com host. This allowed one to take a program built for foo.com and move it to bar.com. If you really always do want mail to always be mail.foo.com, then so be it, use a fully qualified reference, but you sacrifice portability.

For our use case, there is value is having references that having contextual targets. We really do have a different problem in that we will be running in a number of different contexts. Without a cluster, redis alone doesn't mean anything. It will be the projection of redis within a context that will make sense. When that name is evaluated, at request time, we can actually write out the full values into the target records. This ensures that resolution is associated with a particular lifecycle and ambiguity is bounded.

A lot of the ambiguity that can arise with this approach will come from tooling. For a given context, a single term has a single, unambiguous resolution target. If we expose this behavior, we make it easy to detect issues:

First, we have the search space:

$ namespace search
development
default

This can be set locally or globally. Let's resolve a name without taking a cluster into account:

$ namespace resolve redis
redis.development

We err on development here, since we don't actually know if that exists. No communication with a cluster has happened. We can do resolution when connecting a specific cluster:

$ namespace resolve redis .local
redis.default.local

The namespace development doesn't exist locally, so we pickup default instead. We can see why with the candidates command:

$ namespace candidates redis .local
redis.development.local     no `development` namespace in `.local`
redis.default                       selected

Earlier I made the analogy with /etc/hosts files because users would be defining their own search paths for local resolution. Here users would also be defining local aliases. So it really is exactly like a /etc/hosts file.

Yes and no. In the default case, you just need to the cluster address. More complex configurations, shown above, are just to demonstrate the approach. There are lots of ways we can seed a searchspace but it can also be a tool developers can use to control their operations.

Vanity I don't really understand what a vanity name would be in this context.

The names proposed here have meaning. This is a departure from the approach thus far. We really need a name that can be bike-shedded. Certain users may want to have spaces and emdashes. Perhaps, this could be a "description" field.

I see in the examples development (which is how namespaces are typically used), but what about compose's projects (app-like)? Will multiple compose projects fit into the same namespace or are we going to have a namespace per project? (/cc @aanand @bfirsh)

Namespaces are projected based on context. If you want too projects to be compatible, you can run them in the same namespace, or provide aliased references into complementary namespaces. It allows one to write portable applications without having to fixup references within the file itself. For example, let's say I had external references to a redis and www. I could rely on the context to resolve those or have a map file that says they should be affixed to particular reference.

What's the relationship between a cluster, a datacenter and a region? Are clusters part of the same quorum?

For the purposes of this discussion, the name of a cluster is an opaque DNS name to a cluster of managers, in the same quorum, within the same region. That opaque name could include datacenter and region. That is compatible with this model but is out of scope here. By allowing DNS subdomains in cluster names, we leave this problem open but leave room for future adjustments.

Why are clusters explicitly part of the naming?

This allows one to reference resources between clusters. We don't actually need to expand the cluster in the name.

How does this practically work, from the operator's point of view? Will she need to add a --search-space to every swarmctl command? What happens when another operator tries to re-deploy the same file? Does he need to know the search space the previous operator used?

It doesn't really matter. There are a number of options that we can take here, from local configuration to shared setup. The idea is that the operator can decide where the application is deployed. As a result, it is an operators job to ensure that the deployment context can be consistent from deployment to deployment. For example, it would make sense to only deploy from a CI with the correctly configured context to avoid one accidentally deploying to production and vice versa.

It doesn't really matter. There are a number of options that we can take here, from local configuration to shared setup.

It does matter - this is how operators will interact with the system.

The idea is that the operator can decide where the application is deployed. As a result, it is an operators job to ensure that the deployment context can be consistent from deployment to deployment.

This concretely translates in operators creating a bunch of shell scripts or something to correctly set the search options otherwise deployments are not reproducible.

That opaque name could include datacenter and region. That is compatible with this model but is out of scope here.

Regions are not opaque. I agree we don't want to worry about multi-region at this point, but I don't understand why we're tackling multi cluster right now.

There are two kinds of locality segmentation:

Datacenter (AZ): This is partitioning within a region. Datacenters have the property of being redundant and have low-latency between each other. For our users this is a logical partition they need not to worry about: a job can span multiple datacenters. Users can interact by limiting the datacenters their job will be deployed to or using it as a scheduling constraint go guarantee a job spans N datacenters.
Regions: Regions are entirely different targets - they must be considered as different swarm deployments with a different set of managers (operating on a separate quorum) and nodes. The quick way to implement them is not to implement them - let the operator deploy different swarms. Eventually, we will get built-in federation which means an operator can submit a Job to any region and the leader of that region will forward to the leader of the target region.

I don't see exactly where we're going at with clusters - is that a region, a datacenter, both?

It does matter - this is how operators will interact with the system.

The naming format does not affect how we distribute configuration.

This concretely translates in operators creating a bunch of shell scripts or something to correctly set the search options otherwise deployments are not reproducible.

Not really. We can distribute a set of defaults and most users will never be the wiser. If we want the required behavior, we must have a system like this.

Regions are not opaque. I agree we don't want to worry about multi-region at this point, but I don't understand why we're tackling multi cluster right now.

You're missing wide on the point here. We need a way to reference a particular set of objects associated with a quorum set. We are saying that this is <cluster>. <cluster> can be made up of <datacenter>.<region> but it is way out of scope for this discussion. The fact is, with this approach, we have a tool of hierarchy which can be extended to give meaning in the future. We are not "tackling" multi-cluster. We are creating a namespace model that defers it explicitly.

You could omit the entire mention of clusters and this proposal is just as valid. Extra clusters just become and extension of the search space.

@aluzzardi This really isn't that complex. Check out https://github.com/docker/swarm-v2/pull/197 for a small implementation and example of cluster naming resolution.

ping @wfarner for some namespacing input.

I see Aurora has a very opinionated view of names (cluster, environment, ...). If you could go back in time would you do things differently?

If i could go back in time i would use the same namespacing, but i would give more thought to how it relates to service discovery (in your case, DNS) and service management.

What do you mean specifically? What aspects of this proposal would you change?

I agree there are some aspects of service injection into DNS for a particular network, but these are all set operations. Let's say we have the following with an ops, steve and bill namespace:

network = frontend in ops
service = monitor in ops joins frontend
service =  a in bill joins frontend
service = a in steve joins frontend

Within the DNS space of network frontend, we have a.bill and a.steve. The service monitor can be accessed as simply monitor, since it shares a namespace, or monitor.ops. Optionally, if the searchspace of network includes bill or steve, the DNS name a refers to the first matching entry (searchspace = [bill, steve], a = a.bill, searchspace = [steve, bill], a = a.steve). If one of a.bill or a.steve fails, it will fallback to the other.

Let's take a load balancer example (willy nilly syntax, i hope you follow):

LoadBalancerService{name: mybalancer.default, match: "myservice.v*.default"}
ServiceJob{name: myservice.v1.default}
ServiceJob{name: myservice.v2.default}

With such a setup, the load balancer could forward to anything in the match (I have a glob, but we could use labels or anything). We could even have a list of match criteria, each with a weight to direct traffic accordingly.

To tell you the truth, I actually want LoadBalancerJob above to be the default behavior of a ServiceJob. This way, we can look at each ServiceJob as a literal service.

I think the big difference from Aurora's model is that we don't tie role to namespace. I think this is a side-effect of not having a strong user model within docker, which may be somewhat of an advantage. However, I see a role implementation leveraging the namespace model.

To tell you the truth, I actually want LoadBalancerJob above to be the default behavior of a ServiceJob. This way, we can look at each ServiceJob as a literal service.

+1 on this. The ServiceJob provides us all the context that we need to implement a load balancer(based on a LoadBalancerStrategy configuration) for the Job which is typically backed by more that one Task instance. Otherwise we would have to hierarchically load balance at multiple levels in the data path which is for most cases totally unnecessary.

The canary use-case @wfarner brings up is important given that it's very common. In our current architecture, how would an operator deploy a new version of a job and route 10% of the traffic to it for some period of time?

@mrjana @stevvooe thoughts?

@mgoelzer One way to achieve is by providing an image selector configuration in JobSpec and add any number of image SHAs as eligible for deployment and may be also provide some control on roughly what percentage of each should be deployed. When the JobSpec is updated to include a canary image SHA then orchestrator can reconcile to this new configuration to update certain tasks. If this had to be done non-distruptively(i.e without bringing down existing tasks) the number of instances can be increased in the JobSpec so orchestrator does not bring down tasks unless it needs to.

Conversely one can deploy an entirely new Job to test out the canary in a different namespace so it is completely isolated from production. Just that the clients have to know how to reach the canary version of the service.

I think the big difference from Aurora's model is that we don't tie role to namespace. I think this is a side-effect of not having a strong user model within docker, which may be somewhat of an advantage. However, I see a role implementation leveraging the namespace model.

At some point, Core (us and Engine specifically) is going to become responsible for implementing some kind of unified "Docker identity." What that means is not well defined right now -- it could be a distributed identity, backed by a blockchain or P2P system, or it could be something more conventional.

Either way, though, would it make sense to include role in the namespace now as in Aurora's model? Otherwise, we will have to bolt it on later, which could make things uglier.

What do you mean specifically? What aspects of this proposal would you change?

I could have been more clear - i was answering @aluzzardi's question about my choices in Aurora, no commentary on what's been discussed here. Specifically i was referring to the canary case as being sub-par.

I'm probably missing context, but i do wonder what network management looks like for services that span team boundaries. i.e. Who 'owns' a network? Can i elect to operate an 'open' service that any container may access? If i operate a service that has a high fan-in, am i doing a lot of network access shuffling? (Feel free to point me elsewhere if this is laid out, or tell me to swing by to chat.)

Conversely one can deploy an entirely new Job to test out the canary in a different namespace so it is completely isolated from production. Just that the clients have to know how to reach the canary version of the service.

This was the unappealing approach i described. Virtually nobody used it because they needed to go cat herding for all their clients to update. To make matters worse, the upstream clients didn't want to send traffic to their canaries because they associated it with being less stable!

Another tie-in that makes the canary case more complicated is that users often want the ability to monitor them separately. There's no shortage of ways to plumb metadata to make that possible, but just make sure that it jives with the other stories.

I'm probably missing context, but i do wonder what network management looks like for services that span team boundaries. i.e. Who 'owns' a network? Can i elect to operate an 'open' service that any container may access? If i operate a service that has a high fan-in, am i doing a lot of network access shuffling? (Feel free to point me elsewhere if this is laid out, or tell me to swing by to chat.)

If you want some service which is needed across team boundaries (i.e namespace boundaries) that service will be attached to a network (defined in a common namespace) and all containers who want access also participate in that network. This way there is no need for a lot of network access shuffling at all.

PS: Our definition of what a network is a little different from what anyone thinks about it outside but we can chat about it if you like.

@mgoelzer This was the approach to balancing I was referring to:

services:
  myservice.v1:
     # ...
  myservice.v2:
     # ...
  alt:
    labels:
      backup=myservice
  myservice:
    balance:
      strategy: roundrobin
      targets:
        - match: myservice.v1
          weight: 9
        - match: myservice.v2
          weight: 1
        - match: label{backup=myservice}
          weight: 0 # weight zero used for fallback

Above we have three services. Two versions and a backup. The two services are matched by name, with traffic sent via weighted round robin. The weight=0 service is only used when others are all down and also demonstrates label matching.

The interesting thing here is that we can also add a parameter to inject this data into the container running a load balancer such as haproxy, such that one can use the same configuration. Whether they are using default VIP-based load balancing or would like to have a custom solution, such as ELB, the only configuration that changes is the "backend" of the load balancer. For custom solutions, the container target would be the balancer implementation.

It is natural to have a set of service tasks to be load balanced. In almost every case, this makes sense, even if you are talking about some sort of failover based load balancing (imagine leader and follower).
A LoadBalancerJob would still have a container if using an external load balancer, such as haproxy or ELB.
ServiceJob and LoadBalancerJob would nearly always have a similar lifecycle. Both need to be "up" and long-lived.

With those behaviors, we have two options: create a ServiceJob and LoadBalancerJob that are the same except for the addition of a load balancing field or make it a property of ServiceJob that they describe an endpoint and have a balancing configuration. I would favor the latter from a simplicity perspective.

The following are use cases for load balancing services:

Load balance evenly to all tasks.
Load balance to other services, injecting the service set into the environment.
Load balance to one instance of a service until it is down, then direct to the other.
Load balance to service instances by role. Imagine a leader (a single service) and a set of read followers.
Load balance a percentage of traffic to a set of services randomly, per connection.
Load balance a percentage of traffic to a set of services based on properties of the connection (stick balancing).

The other use case we need to consider is injection of external services. Basically, you would have an api, that you want to control the resolution of based on environment. The name of the service gets injected into the DNS and will switch to the correct endpoint based on the service configuration. The running service task can be used to inject the current value or check health of the remote service.

I propose we add an "endpoint" configuration to ServiceJob to control these aspects. The endpoint decides how the underlying service should be presented within DNS, be it direct balancing to the tasks or using the underlying container(s) to make load balancing decisions.

moby / swarmkit

Naming and Namespace in the cluster #192

Requirements

Considerations

Resources

Rules about Naming

Structure

Namespaces

References

Clusters

Access Control

Alternative Models

Vanity

Road Map