moby / libnetwork

networking for containers
Apache License 2.0
2.16k stars 881 forks source link

Remote drivers are (wrongly) assumed to be global #486

Closed thockin closed 9 years ago

thockin commented 9 years ago

https://github.com/docker/libnetwork/blob/master/drivers/remote/driver.go#L32

It should be possible to write local-only drivers.

mavenugo commented 9 years ago

ping @squaremo @tomdee @shettyg. Since you worked on the remote driver implementation more closely, can you please help with defining a proper registration mechanism to determine if the remote driver is interested in being a local scope or global scoped driver ?

squaremo commented 9 years ago

It may be difficult to do this without modifications to the plugin subsystem. My best guess at doing without that is to have two kinds of driver; "Implements": ["NetworkDriver"] means a globally scoped driver, and "Implements": ["LocalNetworkDriver"] means a locally scoped driver. (Immediately obvious problem: what does both mean?)

I agree that this is a gap; but I wonder if it is exactly what e.g., kubernetes needs. What assumptions does libnetwork make about locally scoped drivers?

mavenugo commented 9 years ago

@squaremo i dont think we would need to change the plugin subsystem for this. We could add an explicit call for capability negotiation before having to register the plugin with libnetwork.

If it is a locally scoped driver, then libnetwork will not distribute the network or endpoint information or require a KV store to back these guarantees.

thockin commented 9 years ago

For example, Kubernetes endpoints are not going to be used across hosts. If we end up implementing our own driver, IPAM decisions are local decisions. We assign a CIDR per host. Those IP addresses are not usable on any other host.

On Wed, Sep 2, 2015 at 4:03 PM, Michael Bridgen notifications@github.com wrote:

It may be difficult to do this without modifications to the plugin subsystem. My best guess at doing without that is to have two kinds of driver; "Implements": ["NetworkDriver"] means a globally scoped driver, and "Implements": ["LocalNetworkDriver"] means a locally scoped driver. (Immediately obvious problem: what does both mean?)

I agree that this is a gap; but I wonder if it is exactly what e.g., kubernetes needs. What assumptions does libnetwork make about locally scoped drivers?

— Reply to this email directly or view it on GitHub https://github.com/docker/libnetwork/issues/486#issuecomment-137268787.

mavenugo commented 9 years ago

@thockin as we discussed, this is a good reason to be a local-scoped driver.

shettyg commented 9 years ago

@mavenugo I am likely missing a piece of the puzzle here. A locally scoped remote driver can't work if docker provides uuid instead of names. For e.g:

On host-1: docker network created -d openvswitch foo

My driver currently receives just a uuid. On a different host, if I run:

docker service publish my-service.foo

I will likely get a "foo" network not found error.

What do you have in mind for a locally scoped driver? Can we get names instead of UUIDs?

shettyg commented 9 years ago

Also I think a locally scoped driver will work if commands like "docker network ls" ask the local driver to list networks instead of trying to list UUIDs on its own. The local drivers can provide back UUIDs and names back which is listed by docker. So in theory the local drivers do the job of libkv.

tomdee commented 9 years ago

The suggestion from @squaremo sounds sensible to me. The plugin already has to do a handshake to establish its capabilities so adding a different capability for "local" plugins sounds like a good idea

thockin commented 9 years ago

Is "On a different host" compatible with a locally scoped driver?

On Wed, Sep 2, 2015 at 4:29 PM, Gurucharan Shetty notifications@github.com wrote:

@mavenugo https://github.com/mavenugo I am likely missing a piece of the puzzle here. A locally scoped remote driver can't work if docker provides uuid instead of names. For e.g:

On host-1: docker network created -d openvswitch foo

My driver currently receives just a uuid. On a different host, if I run:

docker service publish my-service.foo

I will likely get a "foo" network not found error.

What do you have in mind for a locally scoped driver? Can we get names instead of UUIDs?

— Reply to this email directly or view it on GitHub https://github.com/docker/libnetwork/issues/486#issuecomment-137272868.

shettyg commented 9 years ago

@thockin So when you say "local" drivers, you are in effect saying that commands like "docker network ls" etc will simply not return information provided to locally scoped driver?

May be if you give an end to end workflow of a locally scoped driver that you have in mind, I will know better.

tomdee commented 9 years ago

@shettyg That's an interesting thought. You're treating locally scoped drivers as a way to sidestep multi-host parts of libnetwork. Which as you point out, only works if libnetwork defers more control to the driver.

mavenugo commented 9 years ago

@shettyg locally scoped driver will make sure libnetwork doesnt synchronize the networks, endpoints and hence yes, it is upto the orchestration system to determine how this is handled across multi-hosts. In the case of k8s, it will work, because it requires its own subnet space per host and doesnt require L2 mobility. And hence I suggested to open this PR. The drivers can determine what it prefer.

shettyg commented 9 years ago

@mavenugo Got it. In that case, @squaremo suggestion is a nice starting point.

shettyg commented 9 years ago

Another thought. Isn't a locally scoped driver same as starting docker daemon with a libkv store that is local only?

mavenugo commented 9 years ago

@shettyg @squaremo @tomdee i dont think having another plugin endpoint is a good idea. This is a property of the network driver and must be honored as such. Introducing another plugin type will call for more changes to the libnetwork core to look for more plugin types, when the functionality provided by the driver is the same.

Hence my suggestion is to add a capability negotiation in the Plugin API & exchange this info,

squaremo commented 9 years ago

Is "On a different host" compatible with a locally scoped driver?

Not in general. Because each host will assume it's acting only locally, it will cons a new UUID for a network it hasn't seen on that host. If "LocalScope" is being used to mean "let me do my own co-ordination", this is going to fail to do the expected thing.

mavenugo commented 9 years ago

@squaremo correct. bridge driver today is a localscoped driver and it doenst depend on the distributed states. Same I think will work for k8s and other drivers such as macvlan, ipvlan plugins.

squaremo commented 9 years ago

@mavenugo Fair point about plugin types; minor shame to have another handshake exchange, but I agree it is better overall.

squaremo commented 9 years ago

bridge driver today is a localscoped driver and it doenst depend on the distributed states

Right; this leaves systems that do their own co-ordination high and dry, unfortunately.

mrjana commented 9 years ago

@squaremo Are you looking for a notion of cluster to be provided to the drivers by libnetwork?

squaremo commented 9 years ago

Are you looking for a notion of cluster to be provided to the drivers by libnetwork?

This wouldn't help, since it would still be the case that drivers only see UUIDs, and these are constructed assuming that Docker's is the only own shared state. So I would lean towards giving the drivers the information they need to do their own co-ordination, which pretty much means the user-supplied names.

thockin commented 9 years ago

Yeah, this would be better. I can then have kubernetes orchestrate each individual docker node to create a network "kubernetes" and use that in all my docker run calls, without having that try to synchronize across nodes if I happen to have a libkv driver installed for some other reason.

On Wed, Sep 2, 2015 at 5:11 PM, Michael Bridgen notifications@github.com wrote:

Are you looking for a notion of cluster to be provided to the drivers by libnetwork?

This wouldn't help, since it would still be the case that drivers only see UUIDs, and these are constructed assuming that Docker's is the only own shared state. So I would lean towards giving the drivers the information they need to do their own co-ordination, which pretty much means the user-supplied names.

— Reply to this email directly or view it on GitHub https://github.com/docker/libnetwork/issues/486#issuecomment-137278183.

mavenugo commented 9 years ago

@thockin k8s can still do that. If you are planning on using local-scope driber and if the network is created by k8s, it has the mapping between the name <-> network-id across all the hosts.

thockin commented 9 years ago

If I use local-scope driver, won't every node have a different network ID for the "kubernetes" network?

On Wed, Sep 2, 2015 at 5:26 PM, Madhu Venugopal notifications@github.com wrote:

@thockin https://github.com/thockin k8s can still do that. If you are planning on using local-scope driber and if the network is created by k8s, it has the mapping between the name <-> network-id across all the hosts.

— Reply to this email directly or view it on GitHub https://github.com/docker/libnetwork/issues/486#issuecomment-137281016.

mavenugo commented 9 years ago

@thockin yes. it will be. just like docker0 bridge today which is different in each host. If you want the "kubernetes" network to have the exact same ID across all the hosts, then you are essentially looking for a Globally scoped driver.

thockin commented 9 years ago

I don't want Docker to try to manage it globally because we have our own API. I don't want to implement generic KV store on top of our structured API. I'm forced to use local drivers, but I need the name so I can find info in my own API. Your surrogate key is not useful to me.

On Wed, Sep 2, 2015 at 9:40 PM, Madhu Venugopal notifications@github.com wrote:

@thockin https://github.com/thockin yes. it will be. just like docker0 bridge today which is different in each host. If you want the "kubernetes" network to have the exact same ID across all the hosts, then you are essentially looking for a Globally scoped driver.

— Reply to this email directly or view it on GitHub https://github.com/docker/libnetwork/issues/486#issuecomment-137329613.

mavenugo commented 9 years ago

@thockin can you please help explain what you mean by Your surrogate key is not useful to me. ? AFAIK, I don't own any surrogate key ;) . jokes apart, I really like to understand your concern here so that we can find a balance between the docker users and kubernetes users. Please note that docker addresses more use-cases than kubernetes.

jainvipin commented 9 years ago

@tomdee, @shettyg

@shettyg That's an interesting thought. You're treating locally scoped drivers as a way to sidestep multi-host parts of libnetwork. Which as you point out, only works if libnetwork defers more control to the driver.

If this works, then I would not have to think two ways to implement drivers when it runs as a plugin in Kubernetes vs natively as remote driver on libnetwork. Assume that I have KV store available.

I imagine by more control you mean providing network name in the API.

thockin commented 9 years ago

Sorry, that came off snippier than I meant it.

My primary key is the network name. That's the key my API knows (or will know). That key exists before I ever call "docker network create".

I don't want docker to try to manage my driver globally because I already have a global control plane. So I have to use a local driver.

Because I have a local driver, every node is going to "docker network create" and get a different UUID.

When we join a container to a namespace you are only telling me the UUID. My driver can not use that UUID to look anything up in my own API.

When I start offering multiple Networks, this just gets worse. I have to make my node-agent keep MORE side-band state that maps the UUID (returned from 'docker network create' right?) to network name and then publish that state to my driver. I can probably do that, but surely you see how this is a terrible hack just to work around docker.

I know other people have asked for network name - why is docker stonewalling the community on this seemingly tiny thing?

dcbw commented 9 years ago

The current side-band hack to retrieve network name is made worse by the fact that the docker+libnetwork remote driver API is synchronous, so the driver cannot query docker during a driver operation or docker will deadlock. Instead the remote driver must cache the UUID and then somehow right after the CreateNetwork() hook request the network name from the docker API. That's obviously racy since there's no guarantee that docker will receive the ListNetworks() request from the driver before libnetwork calls the driver again with CreateEndpoint() or some other call.

mavenugo commented 9 years ago

@thockin @dcbw @squaremo I think we went way off tangent to the intent of the PR.

@thockin do you still see a need for this issue to be resolved ? @squaremo Can you please share your thoughts on the remote api implementation for the request raised in this Issue ?

thockin commented 9 years ago

Yes! without the ability to have local-scope drivers, I can't use them at all, I think. But once we have local drivers, I need the name.

lxpollitt commented 9 years ago

The key from my point of view is that (almost all) existing SDN solutions have their own control plane. So @thockin's comments around control plane, state and KV stores are not Kubernetes specific. If introducing the idea of a "local" remote driver can solve that while maintaining a consistent UX for Docker users then that is a huge win for everyone.

To maintain a consistent UX for Docker users though, things like docker network ls need to work across all hosts networked by the underlying SDN without the user having to run the same docker network create on every host. (If that's not the case then we have not maintained a consistent UX, which I believe is one of @mavenugo & @mrjana main focusses for libnetwork.) That in turn means that Docker libnetwork needs to defer the state ownership of network creation (including the network name) to the "local" remote driver. e.g. When a user runs docker network ls, libnetwork will need to ask the driver for the list of networks.

What do people think?

squaremo commented 9 years ago

I think we went way off tangent to the intent of the PR.

If one is very literal about the description, maybe. I think the intention was to make it possible to use libnetwork in a distributed setting without involving all of its kv-store machinery, and from that point of view, the whole discussion was pertinent.

Can you please share your thoughts on the remote api implementation for the PR request ?

I don't mind doing that (at some point), but I think it is necessary to go further and address the other things that came up in discussion.

mavenugo commented 9 years ago

If one is very literal about the description, maybe. I think the intention was to make it possible to use libnetwork in a distributed setting without involving all of its kv-store machinery,

@squaremo agreed & the reason this issue came about was to support exactly that requirement. We need a PR to back that. We can continue to discuss on the other discussions absolutely, without delaying getting this issue resolved.

dcbw commented 9 years ago

That in turn means that Docker libnetwork needs to defer the state ownership of network creation (including the network name) to the "local" remote driver. e.g. When a user runs docker network ls, libnetwork will need to ask the driver for the list of networks.

That would be a more perfect world I suppose, but how about a 1st step, simpler approach of (a) building libnetwork with the simple builtin local KV store (eg https://github.com/docker/libnetwork/pull/466 ) to ensure docker restart keeps networks around, (b) having the control plane/Kubernetes add/remove networks from via the docker API when it wants to, and (c) telling users who do 'docker network add type=' to just Not Do That?

If people would rather take longer and do it right from the start, that's fine too of course...

thockin commented 9 years ago

I'm a big fan of incrementalism, these days

mavenugo commented 9 years ago

@dcbw sounds good & this issue is one of the first steps. Would be great if someone can back it with a PR.

Can you please elaborate on this

(c) telling users who do 'docker network add type=' to just Not Do That?

squaremo commented 9 years ago

(b) having the control plane/Kubernetes add/remove networks from via the docker API when it wants to

Doesn't that run into either the "network only exists locally" problem or the "libnetwork thinks each host has a different network" problem?

mavenugo commented 9 years ago

@squaremo

Doesn't that run into either the "network only exists locally" problem or the "libnetwork thinks each host has a different network" problem?

For a Locally managed networks, its upto the orchestration entity to create network on the "required" hosts and the corresponding driver (which wants to be a locally scoped) to manage the forwarding. Example of such drivers are Mac/IPVlan drivers and the orchestration manages individual host network with appropriate configurations (subnet range to use, etc.)

dcbw commented 9 years ago

@dcbw sounds good & this issue is one of the first steps. Would be great if someone can back it with a PR.

I can work on this next week if this sounds like a good first step to @thockin while we hash out the more complete solution of having libnetwork ask the drivers for the network list. I'm only back on Wed though so maybe somebody will get it before me.

Can you please elaborate on this

There was a concern expressed in the k8s networking google group that if the orchestration layer was responsible for managing the networks, that it could be a problem (and confuse things) if the user was also able to modify/manage those networks through 'docker network' or the docker API, without the orchestration layer being involved. For example, if the orchestration layer creates network 'blue' with driver "kubernetes", nothing stops a user from modifying blue underneath kubernetes, or even creating their own network 'red' with driver 'kubernetes' that Kubernetes knows nothing about. The first-pass shortcut I proposed would have no protections against that, so for now we could simply say "dont do that".

But in the future if libnetwork delegates network listing to the drivers, perhaps those driver-defined networks should be immutable by the user from the docker API, since they would be controlled by the orchestration layer and any modifications should happen through it and not via 'docker network' directly.

mrjana commented 9 years ago

@dcbw Ultimately the driver gets a chance to say yes or no on network creations and deletions. So in kubernetes case you can implement something like an auth token and pass it as a label to the driver and disallow any network creation or deletion happening out of band which will not have the auth token. I am not suggesting this as a security solution but to block unintentional out of band configuration

squaremo commented 9 years ago

@squaremo agreed & the reason this issue came about was to support exactly that requirement. We need a PR to back that. We can continue to discuss on the other discussions absolutely, without delaying getting this issue resolved.

I'm not sure what you're agreeing with there. I meant that fixing the problem described in the title will not meet the requirement. It might not even be a move in the right direction. It seems to me that the division of the model into "GlobalScope", which forces co-ordination, and "LocalScope", which prevents co-ordination, needs reevaluation.

lxpollitt commented 9 years ago

@dcbw

I can work on this next week if this sounds like a good first step to @thockin while we hash out the more complete solution of having libnetwork ask the drivers for the network list. I'm only back on Wed though so maybe somebody will get it before me.

I discussed this longer term desire with @mavenugo on Friday. He is passionately against libnetwork querying the driver for networks (or other state) due to fundamental architecture beliefs and philosophy that I think is unlikely to shift. So while this proposed PR could be useful for some people, I wouldn't think of it as a step towards libnetwork deferring state ownership to drivers.

@mavenugo - please say if I am misrepresenting our discussion.

Kubernetes could still potentially use this "local-scoped" remote driver to get networking going using libnetwork. Kubernetes would be responsible for calling network create on every host, and then stitching together the network create calls within the driver to map to a single underlying network. For this to work the driver needs to get the network name from somewhere. In the past the libnetwork team have been firmly against passing the network name via the driver API. I don't know if this thinking has moved on since then, or whether they still want the drivers to use an OOB mechanism to map UIDs to network names. @mavenugo - can you comment on your latest thinking here?

In either case, it's still pretty kludgy. I haven't fully thought it through but I think most likely the driver will need to know it is operating in this mode to do the network stitching, so you won't be able to take a standard off the shelf libnetwork driver and expect it to work with k8s without any changes.

thockin commented 9 years ago

Network creation/manipulation should be akin to containers. Kubernetes users CAN go around kubernetes and touch them directly, but if they break it, they get to keep both halves.

As for good first steps, we're very much figuring out which end is up and what we would have to do vs want to do to defer network management to docker. I think we're stuck without this fix, but I am not sure it is sufficient.

On Sat, Sep 5, 2015 at 11:03 AM, Jana Radhakrishnan < notifications@github.com> wrote:

@dcbw https://github.com/dcbw Ultimately the driver gets a chance to say yes or no on network creations and deletions. So in kubernetes case you can implement something like an auth token and pass it as a label to the driver and disallow any network creation or deletion happening out of band which will not have the auth token. I am not suggesting this as a security solution but to block unintentional out of band configuration

— Reply to this email directly or view it on GitHub https://github.com/docker/libnetwork/issues/486#issuecomment-137981675.

thockin commented 9 years ago

On Sat, Sep 5, 2015 at 1:10 PM, Alex Pollitt notifications@github.com wrote:

In either case, it's still pretty kludgy. I haven't fully thought it through but I think most likely the driver will need to know it is operating in this mode to do the network stitching, so you won't be able to take a standard off the shelf libnetwork driver and expect it to work with k8s without any changes.

I agree that it's pretty hack-tacular for things that require some global coordination but DON'T want libkv. It might be passable for truly local-scoped drivers (ipvlan etc). It's no better or worse that what we do today with the assumed bridge driver.

mavenugo commented 9 years ago

@thockin @lxpollitt I really dont understand what you mean by a kludge and a hack. If you tend to mean that libnetwork must make use of driver states to build its own view of the network, then I would say, yes I agree, that is as hack-tacular as it gets. So, I would request folks to give some context behind any such disparaging comments.

The definition of local-scope and global-scope are very clear and simple. local-scoped objects have local significance and will not make use of any of the distributed state services provided by libnetwork. And global-scoped objects automatically gets that service. We absolutely need that in order to provide the user-experience & guarantees that the user expect of Docker. We have 2 very good examples in bridge and overlay drivers. We will add more drivers such as macvlan/ipvlan, etc. which will fall under the local-scoped drivers and it will honor these philosophies.

The point is NOT about libkv and the use of it. Even if libnetwork decides to not make use of libkv or consistent states, there must be a single source of truth. Every layer in the system trying to have its own view of cluster is a bad design to begin with. So, question to @thockin, does k8s tries to understand the state of the cluster from the driver directly (or) is it okay for k8s to have its view of the cluster while the existing SDN solutions (as @lxpollitt put it) can have its own and they are not aligned ?

Again, I think am digressing... my 2c is to get this issue resolved first as suggested by @dcbw. Delaying this further, we might miss the boat for 1.9 code-freeze if you care about that.

squaremo commented 9 years ago

what we would have to do vs want to do to defer network management to docker.

Just to check, @thockin: is deferring the network management to docker an inescapable requirement?

mrjana commented 9 years ago

I'm not sure what you're agreeing with there. I meant that fixing the problem described in the title will not meet the requirement. It might not even be a move in the right direction. It seems to me that the division of the model into "GlobalScope", which forces co-ordination, and "LocalScope", which prevents co-ordination, needs reevaluation.

@squaremo Forgetting for a moment all the k8s context that is driving this requirement why does this scoping model need reevaluation? We already do this to in-tree drivers i.e bridge is local scoped while overlay is global scoped. There is a bug in the code where in remote drivers don't get to have a say in which scope they belong while in-tree drivers get to and that is wrong. We are merely trying to fix that. Instead of fixing that mismatch independently why are we trying to drag everything along with it?

squaremo commented 9 years ago

Forgetting for a moment all the k8s context [...] Instead of fixing that mismatch independently why are we trying to drag everything along with it?

Perhaps it would be better to change the title and description of this issue, so that they reflect the requirements that motivated it, and the discussion here. Otherwise those will just have to be reiterated.