Closed malikkal closed 7 years ago
This is resolved. probably a bug?
Enable mode 2 for rp_filter in the VCH host and make it permanent. echo 2 > /proc/sys/net/ipv4/conf/default/rp_filter sysctl -w "net.ipv4.conf.all.rp_filter=2"
For details refer:
http://www.slashroot.in/linux-kernel-rpfilter-settings-reverse-path-filtering
@malikkal given the symptoms I assume this is related to setup of the network serial port connection between ESX host and the endpointVM (protocol:tcp src:esxhost dest:endpoint.management.net:2377
).
Does your comment mean that:
esxhost
is not routable from the endpointVM's management interface (I presume eth0
as we do not rename the management interface), orclient
in this case)Altering the reverse packet filter opens up the possibility of spoofing, with someone pretending to be an ESX host creating a backchannel. While this should be handled at multiple levels (#2849) it's a weakening of the default configuration. I'd like to understand the specific issue you're seeing as it's unexpected if there's a separate management network.
If it's the first option I'm hoping this can be addressed by adding more permissive routes on the management interface, with rp_filter=2
being a fallback approach.
For implementing a global rp_filter
change - update: https://github.com/vmware/vic/blob/master/isos/appliance/nat-setup#L30
To address more permissive routes we would need to: a. require that the management network config have a netmask that's broad enough to encompass the ESX hosts as well as the vSphere endpoint, or b. inspect the hosts in the cluster and add specific routes, either for a combined subnet or for the hosts themselves. c. determine if the hosts are routable on the management interface and, if not, set rp_filter=2 hoping that the ESX originated traffic has a route to the endpoint at all.
@corrieb @kreamyx this is another vote for the communication to be injected via vmci and routed at the infrastructure layer where the hosts are assumed to have connectivity for vmotion and friends to function.
This is also a reminder to re-introduce the post-install functional correctness checks:
@malikkal If you're using vSAN, would you mind trying docker logs
to stream logs from a running container? With the container live we need to go direct to the owning host in order to access the output.log
file instead of relaying through vCenter. This would also be impacted by an inability to route to the host on the management network, meaning this would only work if the public
network (default route) has a route back to the hosts.
@hickeng Yes, its related to setup of the network serial port connection between ESX host and the endpointVM. Many thanks for the explanation during the Webex, which prompted to look further. Appreciate it.
Sorry, we don't use vSAN yet. BTW, can I redirect selected logs to loginsight?
@malikkal My pleasure :) Wasn't sure if you were the same person, but seemed likely given the scenario.
If you add additional routing targets via --management-network-gateway
Gateway for the VCH on the management network, including one or more routing destinations in a comma separated list, e.g. 10.1.0.0/16,10.2.0.0/16:10.0.0.1
does that remove the need for rp_filter
?
We don't currently support direct loginsight integration for container logs. Given the logs are currently being persisted on the datastore in the containerVM folder our available approaches at this time are:
Basically there are ways to do it that are trivial to add, but ways to do it without adding complexity or burden to container usage involve work - I'm biased against anything that mandates a specific network requirement for the containerVMs. I do not know enough about operational requirements to know if a scheduled batch collection is feasible or whether live log data is required so I err towards thinking about the latter.
@hickeng Can you estimate and prioritize this for me since you have been working on this? And I guess rename this issue to capture the desire to support direct loginsight integration?
@mhagen-vmware I'm leaving this issue open and for the correct configuration of rp_filter and management network routing as that was the original. I've split out #4771 to record the loginsight request.
We could do with a response from @malikkal about whether adding additional routing destinations for the hosts via --management-network-gateway
addresses the problem (as per this comment) however the need for a way to weaken the filtering for asymmetric routing setups remains regardless.
Sizing will be for an option to all weakening the packet filtering (vic-machine and doc primarily). This should not be a change to the core ISO configuration but a per deployment choice - I've added this to the 1.2 project as we should get this into that release for @malikkal
Pardon me for the delay here; juggling priorities. I will test this and update here by Tue 18th. Thank you for all the support.
@stuclem #4816 adds an option --asymmetric-routes
to deal with the case where we have genuinely asymmetric routes and adding destinations to the --management-network-gateway
option will not suffice.
The result of this being true is to set rp_filter to "loose" mode in the endpointVM; see https://en.wikipedia.org/wiki/Reverse_path_forwarding#Loose_mode
Likely symptoms needing this option are:
In more detail, when starting a container without -d, we will never see the log entries for incomming connections from the newly started container in the portlayer log. The line of interest is the first one in the following snippet that logs vSPC handling a new connection:
Mar 20 2017 04:11:59.010Z INFO connection received
Mar 20 2017 04:11:59.010Z INFO sending WILL 0
Mar 20 2017 04:11:59.010Z DEBUG entered write loop
Mar 20 2017 04:11:59.025Z DEBUG Sending command: 251 0
Mar 20 2017 04:11:59.025Z INFO sending WILL 3
Mar 20 2017 04:11:59.025Z DEBUG Sending command: 251 3
Mar 20 2017 04:11:59.025Z INFO sending WILL 1
Mar 20 2017 04:11:59.025Z DEBUG Sending command: 251 1
Mar 20 2017 04:11:59.025Z INFO sending DO 0
Mar 20 2017 04:11:59.025Z DEBUG Sending command: 253 0
Mar 20 2017 04:11:59.025Z INFO sending DO 3
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 253 3
Mar 20 2017 04:11:59.026Z INFO sending DO 232
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 253 232
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 254 37
Mar 20 2017 04:11:59.026Z DEBUG Sending WILL command
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 251 0
Mar 20 2017 04:11:59.026Z DEBUG Sending WILL command
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 251 3
Mar 20 2017 04:11:59.026Z DEBUG Sending WILL command
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 251 1
Mar 20 2017 04:11:59.026Z DEBUG Sending command: 253 0
Mar 20 2017 04:11:59.027Z DEBUG Sending command: 253 3
Mar 20 2017 04:11:59.027Z DEBUG Sending command: 253 232
Mar 20 2017 04:11:59.027Z INFO vspc received KNOWN-SUBOPTIONS command
Mar 20 2017 04:11:59.027Z DEBUG [BEGIN] [github.com/vmware/vic/lib/vspc.(*handler).knownSuboptions:114] handling KNOWN-SUBOPTIONS
Mar 20 2017 04:11:59.027Z DEBUG response to KNOWN-SUBOPTIONS: [255 250 232 1 0 1 2 3 40 41 43 44 45 46 48 70 71 73 80 81 84 85 86 87 82 83 255 240]
Mar 20 2017 04:11:59.027Z DEBUG [ END ] [github.com/vmware/vic/lib/vspc.(*handler).knownSuboptions:114] [592.003µs] handling KNOWN-SUBOPTIONS
Mar 20 2017 04:11:59.027Z INFO vspc received DO-PROXY command
Mar 20 2017 04:11:59.027Z DEBUG [BEGIN] [github.com/vmware/vic/lib/vspc.(*handler).doProxy:131] handling DO-PROXY
Mar 20 2017 04:11:59.027Z DEBUG response to DO-PROXY: [255 250 232 71 255 240]
Mar 20 2017 04:11:59.028Z DEBUG [ END ] [github.com/vmware/vic/lib/vspc.(*handler).doProxy:131] [42.942µs] handling DO-PROXY
Mar 20 2017 04:11:59.028Z INFO vspc received VMUUID command
Mar 20 2017 04:11:59.028Z DEBUG [BEGIN] [github.com/vmware/vic/lib/vspc.(*handler).cVMUUID:84] handling VMUUID
Mar 20 2017 04:11:59.028Z INFO vmuuid of the connected containerVM: 5281ea49d163c912-226e06e2ed4763e2
Mar 20 2017 04:11:59.028Z INFO attempting to connect to the attach server
Mar 20 2017 04:11:59.028Z DEBUG [ END ] [github.com/vmware/vic/lib/vspc.(*handler).cVMUUID:84] [114.95µs] handling VMUUID
Mar 20 2017 04:11:59.028Z INFO attach connector: Received incoming connection
Mar 20 2017 04:11:59.038Z DEBUG HandshakeClient: Sending syn.
Mar 20 2017 04:11:59.038Z DEBUG Sending command: 254 44
Users should first check that the management-network-gateway
has route entries for the subnets containing both the target vCenter and the corresponding ESX hosts, assuming ESX hosts are accessible via the management network (this is a more secure deployment option). If not then asymmetric routing is required to permit incoming connections from the hosts via one of the other endpointVM networks.
lgtm
closing now, please reopen if you are still having issues
docker run without -d doesn't work. Engine version: v0.8.0-7315-c8ac999
Steps to reproduce.
Deploy a VCH with separate management and public network and with or without additional container networks. Run a container out of the registry with -it. In this case a Redhat 6.8 base image. Generic error Error response from daemon: Server error from portlayer: unable to wait for process launch status Container will be stuck at 'Starting' when issuing a ps -a
However, the container will run without issues when deployed via Admiral or with -itd.
In summary no interactive session with container possible.
Also refer #4212, #3315 and #4223. (Similar issues)
VCH routes for reference.