Open andrewtchin opened 6 years ago
As it reports in the vic-machine output the endpointVM was waiting for an IP address:
Nov 30 2017 09:03:41.157-06:00 ERROR op=4336.2: Failed to obtain IP address for client interface
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: Use vic-machine inspect to see if VCH has received an IP address at a later time
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: State of all interfaces:
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "bridge" IP: "waiting for IP"
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "public" IP: "waiting for IP"
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "client" IP: "waiting for IP"
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "management" IP: "waiting for IP"
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: State of components:
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "vicadmin": "<nil>"
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "docker-personality": "<nil>"
Nov 30 2017 09:03:41.158-06:00 INFO op=4336.2: "port-layer": "<nil>"
without tether.debug
for the endpointVM there's no more diagnosis that can be done.
We can hypothesize that:
Given the toolbox started after boot and there are no heartbeat timeouts reported in the vmware.log it's unlikely that the endpointVM was so resource starved that it couldn't obtain or set an address.
marked as a ship stopper until triaged. @cgtexmex @hickeng feel free to move it out if needed after discussion with @mlh78750
As @hickeng stated there's no further diagnosis that can be done for this issue as we don't have any logs and the environment that experienced the failure was torn down. #6894 has been opened to improve our response to a failure by improving log collection and available diagnostic information.
cc @mhagen-vmware @mlh78750
Since we have no logs and it hasn't reoccurred in the past 5 nightly executions I'm closing this issue.
reopened as a holder / reminder that we need to document / design what logs, etc need to be provided to troubleshoot this or similar issues in the future.
@cgtexmex is https://github.com/vmware/vic/issues/6894 the issue that can act as the placeholder? or were you thinking we needed something else?
@mlh78750 yeah #6894 is the placeholder / epic -- leaving this open more to remind me that I have a specific task to complete - document what would have be helpful for a similar failure...that might result in additional issues being opened to help track / decompose...anyhoo I feel like I'm rambling..
VIC version:
v1.4.0-dev-17605-6144b31
Deployment details:
Nightly 6.0 5-21-Datastore-Path Datastore - Space in Path with Scheme
Logs:
time="2018-03-19T03:41:12Z" level=info msg="Waiting for IP information"
time="2018-03-19T03:44:12Z" level=error msg="vic/pkg/trace.(*Operation).Err: Create error: context deadline exceeded\nvic/cmd/vic-machine/create.(*Create).Run:744 Create\nvic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux"
time="2018-03-19T03:44:12Z" level=error msg="vic/lib/install/management.(*Dispatcher).ensureApplianceInitializes: Create error: context deadline exceeded\nvic/cmd/vic-machine/create.(*Create).Run:744 Create\nvic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux"
time="2018-03-19T03:44:12Z" level=debug msg="properties: [config.extraConfig summary.runtime.connectionState]"
time="2018-03-19T03:44:12Z" level=error msg="vic/pkg/trace.(*Operation).Err: Create error: context deadline exceeded\nvic/cmd/vic-machine/create.(*Create).Run:744 Create\nvic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux"
time="2018-03-19T03:44:12Z" level=error msg="Unable to get vm config: Post https://10.161.9.152/sdk: context deadline exceeded"
https://storage.cloud.google.com/vic-ci-logs/vic_nightly_logs_2018-03-16-20-32-56.zip?authuser=1 (@hickeng edited for correct link)
The most recent failure timed out waiting for IP because it took 205s to get a DHCP response. We eventually got the following error, then succeeded on the retry:
Mar 17 2018 03:13:41.917Z DEBUG retrying "DHCP request": errno=11, error=resource temporarily unavailable
While looking for others with resource temorarily unavailabile
I found i've already diagnosed this same log bundle in #7541. I'm setting the labels back to what @cgtexmex had when this was a placeholder for logging improvements for easier diagnosis of the original failures.
Date: 2017 Nov 30 Engine build: 15140 vSphere version: 6.0 Test(s): 5-21-Datastore-Path 5-21-Datastore-Path.zip
CDE during create after uploading ISOs