Containers have access to Management assets

chrismarget commented 7 years ago

The documentation for VCH network types asserts that the VCH Management network is a sensitive resource requiring some degree of protection:

always use a secure network

use separate networks for the management network and the container networks

It comes as some surprise then that containers attached to "bridge" are allowed to NAT through the VCH and have full access to assets on the Management and/or Client networks, or reachable via the gateway on those networks.

Any container can speak to vSphere assets. When it does, it speaks with the IP address of the VCH, making the (unwanted?) traffic impossible to filter.

I understand this is a non-trivial problem: The VCH needs to be able to reach all of these subnets and since the VCH also performs IP transit (routing) duty for containers, this is a non-trivial problem.

I'd submit that the best solution to the issue is separating the various VCH functions (docker API listener, vSphere client, container gateway) into independent network namespaces so that their routing tables aren't co-mingled.

So... Is the masquerading of containers onto these other networks by design? Is it reasonable to rely on (containerize some software which acts as a vSphere or docker-engine client) this feature?

mlh78750 commented 7 years ago

This is not how it should work. We should prevent containers on the bridge network from reaching the management network.

Thank you for your report. We'll get on this right away.

mlh78750 commented 7 years ago

cc @hmahmood @andrewtchin

chrismarget commented 7 years ago

Thanks, Mike.

What's the word on accessing "client" assets from "bridge"?

Client and Management appear to me to be pretty much the same from a routing/iptables/NAT perspective. Perhaps they are not the same from a security concern perspective.

Is access to "client" from "bridge" (currently allowed) a problem that needs to be fixed?

Is consuming the docker-engine API from a container a supported function?

mlh78750 commented 7 years ago

@chrismarget That's a good question. Management is clearly a VIC only network and should not be allowed. The split between client and public is different. The concept of a client network in regular docker doesn't exist. We added it since some ops teams like to have the C&C traffic on dedicated networks and separate from the workload traffic. This is not easily supported on regular docker today, but it was easy for us to add into VIC. That being said, we currently have a limit on the number of interfaces for the endpoint VM so one of management, client, and public have to be shared.

It is clear that if a container wants to route out via NAT that it should be allowed to do that via public. So if client and public are shared, then that implies client networks would be reachable as well.

My default would be to not allow routing to client if it was separate from public. Is that what you would expect?

hmahmood commented 7 years ago

This may be fixed by #3875 since we changed the default forwarding policy to DROP. @rajanashok is going to write a test to validate.

mlh78750 commented 7 years ago

@hmahmood good point. Let's test (and write some tests) and confirm.

chrismarget commented 7 years ago

@mlh78750 Re: what would I expect?

I approach the issue as more of a router jockey than a container jockey. Possibly because of that mindset it strikes me that the outbound NAT for containers service provided by the VCH should be entirely separate from the VCH's admin functions, particularly those functions which are unlike vanilla docker.

In saying that, I do not mean that I expect VCH to block containers from reaching the Management or Client LANs (or prefixes reachable via those gateways). Maybe I've got a container which is a legitimate vSphere client!

Rather, when a container attempts to reach those prefixes, the container traffic should egress the Public interface and its packets should have their src-ip NATted to the IP of the Public interface. The packets should then traverse external network infrastructure on their way to the Management and Client LANs (which also happen to be directly connected to the VCH). Container generated traffic will then be subject to policy enforcement by devices in the physical network.

Consider a scenario where something on (or reachable via) the management or client network wants to consume a service published (docker run -p) by a container. TCP SYN packet will traverse the routed infrastructure from the special-to-the-VCH network to the service listening on the Public LAN.

What will be the path for the SYN/ACK generated by the container?
Will the SYN/ACK be allowed back to the client?
Will it NAT correctly?
If this flow traversed a firewall (likely scenario - we're talking about homing the VCH onto different security zones, after all) along the way, will it be dropped due to state table miss?

I understand this is not straightforward. Router jockeys would be talking about VRFs right now. In this Linux case, we might be talking about namespaces or named routing tables (maybe use iptables --set-mark to select traffic pre-routing as it ingresses "bridge", then match the mark with a mangle rule to divert traffic to a named table which always uses the "pubic" gateway).

Another option: Separate the VCH functions into containers (one for IP transit / name service functions, another for docker-engine server / vSphere client functions) with separate network namespaces. Heck, separate the VCH into separate VMs even!

Either way, allowing either of the following behaviors for container traffic feels wrong to me:

Routes to certain destinations cause containers speak with an unexpected IP (the current situation)
Black-holing certain destinations simply because the VCH uses those prefixes for some administrative purpose (the change mentioned by @hmahmood)

I'm sorry for typing a novel here. It feels like I might have just poured a can of worms on your desk. I'm sorry about that too. Thank you for engaging with me.

hmahmood commented 7 years ago

I don't believe the changes I mentioned black-hole specific destinations. Traffic from containers destined for the public interface is properly NAT'ed and does have the src ip set as that of the public interface. If the assets on the client/management networks are reachable through the public network, then the use case you describe should work with the changes I mentioned above.

chrismarget commented 7 years ago

@hmahmood:

If the assets on the client/management networks are reachable through the public network

Those destinations are not reachable through public. ip route list on the VCH shows that those destinations are reachable via the client/management interface, either because they're directly connected (on this subnet) or because they're routed via a gateway on the client/management subnets.

hmahmood commented 7 years ago

@chrismarget is that behavior not correct? Sorry if I am misunderstanding the problem here.

chrismarget commented 7 years ago

@hmahmood establishing correct behavior is certainly something to prioritize.

I created the report not because of my expectations, but because the current behavior seemed contrary to the tone of the documentation.

The behavior I expect is:

Containers should speak as the Public IP regardless of destination
All networks should be able to consume services published by containers

We don't seem to have that now.

hmahmood commented 7 years ago

Understood about (1.). The iptables rules currently would drop traffic destined for management from bridge, but you instead want it go out over the public network (with the public ip).

For (2.) you can connect containers to the networks that want to access services directly to that network, which is the preferred method to consume services instead of using NAT. See https://github.com/vmware/vic/blob/master/doc/user_doc/vic_installation/vch_installer_options.md#container-network

chrismarget commented 7 years ago

Thank you @hmahmood.

I'm familiar with the VCH "container network" feature, plan to use it for containers which require IP multicast.

I did not know that this was the preferred method of delivering container-provided services in the VIC universe. In fact I'd thought (assumed?) it was the other way around. Can you steer me toward something that expands on the philosophy behind using one vs. the other? I'd like to make an informed choice here if there are well explained best practices...

hmahmood commented 7 years ago

The NAT'ing is mostly supported by VIC for compatibility with docker. Connecting to the services exposed by the container directly is preferable due to:

no NAT'ing overhead
no single point of failure
you have the freedom to expose services to the networks you want, and are not constrained to using the networks that the VCH is connected to

chrismarget commented 7 years ago

Thank you @hmahmood. Each of your arguments makes sense. It seems I'd been viewing everything through a docker lens where only one of those arguments (no NAT) really holds, and dropping containers onto physical LANs is the exception, rather than the rule. I'll try to stop that way of thinking about VIC.

rajanashok commented 7 years ago

Just merged the integration test to verify the network connectivity between bridge and management. #4009

rajanashok commented 7 years ago

Assigned it back to @hmahmood

stuclem commented 7 years ago

Proposed text for the 0.9 release notes:

Containers have access to vSphere management assets. #3970
Containers that are attached to the bridge network can use NAT through the VCH and so have full access to assets on the management and client networks, or they can be reached via the gateway on those networks. As a consequence, any container can access to vSphere assets.

@mlh78750 and @hmahmood does this cover it?

stuclem commented 7 years ago

RN text approved by @hmahmood via email. Removing kind/note tag.

anchal-agrawal commented 7 years ago

Reprioritizing to medium after chatting with @hmahmood. The original issue has been addressed; we now drop traffic from the bridge network to the management network. What remains is routing that traffic via the public network. Until that's addressed, the user can deploy a VCH with --container-network to give a container direct access to management resources if needed.

hickeng commented 7 years ago

@rajanashok Please can we have an additional test that assets correct behaviour of the original issue - that you cannot access the management network from a container on the bridge network.

rajanashok commented 7 years ago

@hickeng The test you requested is already running in CI. Connectivity Bridge to Management #4009 , is this the test you meant?

hmahmood commented 7 years ago

@rajanashok the test in #4009 for bridge to management traffic is not correct (I don't know how I missed that in my review). I will rewrite it as part of #4816 .

vmware / vic

Containers have access to Management assets #3970