Closed chengwang86 closed 6 years ago
I think the error occurred here https://github.com/vmware/govmomi/blob/master/vim25/methods/methods.go#L2134 when the govc client tried to ask the vcenter server to create a cluster.
Here is a list of the possible reason for License not available to perform the operation
:
The previous cmd govc datacenter.create ha-datacenter
(right before the failed cmd) succeeded.
Discussed with @emlin and @mhagen-vmware . We can use govc cmd to update the license of the vcenter server after the testbed is deployed in that particular nightly test.
I deployed a testbed on nimbus using the same cmd as the one used in this test. Here are the licenses on one of the vcenter servers:
govc license.ls
Key: Edition: Used: Total:
00000-00000-00000-00000-00000 eval 0 0
govc license.assigned.ls
Id: Scope: Name: License:
id1 name1 00000-00000-00000-00000-00000
id2 name2 00000-00000-00000-00000-00000
Reopening. Seen in Nightly Build 13338.tar.gz on 8/22 5-3-Enhanced-Linked-Mode.zip
The reason to revert my previous fix for issue is that we saw intermittent test failures due to
Running command 'govc license.add .......... 2>&1'.
${out} = govc: ServerFaultCode: Access to perform the operation was denied
The original failure of License not available to perform the operation
only occurred twice so far, which makes me wonder if we really need to have more govc
cmds to fix this license issue while introducing more potential govc bugs. Of course we can Run Keyword And Ignore Error govc license.add ...
, but it doesn't seem to be a good fix.
@mhagen-vmware Any thoughts?
Chatted with @mhagen-vmware about this. We need to create a bugzilla ticket for the nimbus team about this issue.
Here is the response from the VPX/licensing-infrastructure team:
as there is no logs and the issue happened once, quite long time ago, there is not much we can do. If it reproduces again, fill free to reopen this bug, adding the relevant logs and reproduction steps.
I'm asking them about what specific logs they need for debugging.
@mhagen-vmware @rogeliosanchez The VPX/licensing-infrastructure team wants us to provide vc-support bundle logs the next time this failure occurs. They have closed my bugzilla ticket. Do you think it is worth modifying our test script to provide these support bundles just for such a rare failure?
This PR https://github.com/vmware/vic/pull/6208 has added more log information to the test so that we would be able to know the actual license that is used when the failure occurs.
Similar failure on latest nightly as well:
Sep 3 2017 12:04:58.195Z ERROR License check FAILED on hosts:
Sep 3 2017 12:04:58.195Z ERROR "\"/ha-datacenter/host/cls/10.192.45.42\" - license missing feature \"serialuri\""
Sep 3 2017 12:04:58.195Z ERROR "\"/ha-datacenter/host/cls/10.192.45.42\" - license missing feature \"dvs\""
Sep 3 2017 12:04:58.195Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).CheckDrs:525]
Sep 3 2017 12:04:58.245Z INFO DRS check OK on:
Sep 3 2017 12:04:58.245Z INFO "/ha-datacenter/host/cls"
Sep 3 2017 12:04:58.245Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).CheckDrs:525] [50.221858ms]
Sep 3 2017 12:04:58.246Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).certificate:476]
Sep 3 2017 12:04:58.247Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).certificate:476] [1.39402ms]
Sep 3 2017 12:04:58.247Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).certificateAuthorities:495]
Sep 3 2017 12:04:58.247Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).certificateAuthorities:495] [266.884µs]
Sep 3 2017 12:04:58.247Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).registries:520]
Sep 3 2017 12:04:58.247Z DEBUG URL: https://harbor.ci.drone.local/v2/
Sep 3 2017 12:04:58.248Z DEBUG [BEGIN] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] https://harbor.ci.drone.local/v2/
Sep 3 2017 12:05:08.248Z DEBUG [ END ] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] [10.000395605s] https://harbor.ci.drone.local/v2/
Sep 3 2017 12:05:08.248Z DEBUG URL: http://harbor.ci.drone.local/v2/
Sep 3 2017 12:05:08.248Z DEBUG [BEGIN] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] http://harbor.ci.drone.local/v2/
Sep 3 2017 12:05:08.249Z DEBUG [ END ] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] [664.656µs] http://harbor.ci.drone.local/v2/
Sep 3 2017 12:05:08.249Z WARN Unable to confirm insecure registry harbor.ci.drone.local is a valid registry at this time.
Sep 3 2017 12:05:08.249Z INFO Insecure registries = harbor.ci.drone.local
Sep 3 2017 12:05:08.249Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).registries:520] [10.001737688s]
Sep 3 2017 12:05:08.249Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).compatibility:673]
Sep 3 2017 12:05:08.445Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).checkDatastoresAreWriteable:728]
Sep 3 2017 12:05:08.888Z WARN Only one host can access all of the image/container/volume datastores. This may be a point of contention/performance degradation and HA/DRS may not work as intended.
Sep 3 2017 12:05:08.888Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).checkDatastoresAreWriteable:728] [442.794289ms]
Sep 3 2017 12:05:08.888Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).compatibility:673] [639.153322ms]
Sep 3 2017 12:05:08.888Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).syslog:834]
Sep 3 2017 12:05:08.888Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).syslog:834] [49.123µs]
Sep 3 2017 12:05:08.888Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).ListIssues:266]
Sep 3 2017 12:05:08.888Z ERROR --------------------
Sep 3 2017 12:05:08.889Z ERROR License does not meet minimum requirements to use VIC
Sep 3 2017 12:05:08.889Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).ListIssues:266] [141.927µs]
Sep 3 2017 12:05:08.889Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).Validate:291] [13.293048919s]
Sep 3 2017 12:05:08.889Z ERROR Create cannot continue: configuration validation failed
Date: 09/03 Build: 13555 vsphere 6.0 Test: 5-3-ELM
Seen in 5-3-Enhanced-Linked-Mode.zip
Missing licenses features in the most recent failures after the added logging were serialuri and dvs. Neither have been missing for the last 3 weeks.
Seen in 6.0 nightly: 5-3-Enhanced-Linked-Mode.zip
Will follow this up with a bugzilla ticket.
It may be possible that we need to reach out to the nimbus folks regarding the licensing of individual features.
I have reopened the previous issue with bugzilla and attached the latest failure logs.
In the meantime, will add a retry in case of failure.
Seen in 6.0 run on 11/19 report: 5-3-Enhanced-Linked-Mode.zip
Seen again in 6.0 run on 12/08, test group 5-3-Enhanced-Linked-Mode:
Dec 8 2017 12:54:33.545-06:00 ERROR op=18200.1: License check FAILED on hosts:
Dec 8 2017 12:54:33.546-06:00 ERROR op=18200.1: "\"/ha-datacenter/host/cls/10.160.110.241\" - license missing feature \"serialuri\""
Dec 8 2017 12:54:33.546-06:00 ERROR op=18200.1: "\"/ha-datacenter/host/cls/10.160.110.241\" - license missing feature \"dvs\""
......
Dec 8 2017 12:54:44.233-06:00 ERROR op=18200.1: --------------------
Dec 8 2017 12:54:44.233-06:00 ERROR op=18200.1: License does not meet minimum requirements to use VIC
Dec 8 2017 12:54:44.233-06:00 DEBUG [ END ] op=18200.1 [vic/lib/install/validate.(*Validator).ListIssues:271] [96.98µs]
Dec 8 2017 12:54:44.233-06:00 DEBUG [ END ] op=18200.1 [vic/lib/install/validate.(*Validator).Validate:297] [13.822934637s]
Dec 8 2017 12:54:44.233-06:00 ERROR op=18200.1: Create cannot continue: configuration validation failed
Dec 8 2017 12:54:44.288-06:00 ERROR op=18200.1: --------------------
Dec 8 2017 12:54:44.288-06:00 ERROR op=18200.1: vic-machine-linux create failed: validation of configuration failed
Log bundle: 5-3-Enhanced-Linked-Mode.zip
We need further analysis on this, I have a check now in the setup that uses govc to specifically check for these licenses features and it gets past that without error, so somehow vic-machine is doing the check differently and coming up with different results.
Specifically in the case of the host that vic-machine reported didn't have the licenses:
Run govc object.collect -json $(govc object.collect -s - content.licenseManager) licenses | jq '.[].Val.LicenseManagerLicenseInfo[].Properties[] | select(.Key == "feature") | .Value'
BuiltIn . Should Contain ${out}, serialuri <--- PASS
BuiltIn . Should Contain ${out}, dvs <--- PASS
We need to better understand how vic-machine is detecting the license compared to the test for it earlier in the robot file. After that we need a way to prevent this from surfacing again. Removing from list of ship stoppers.
Closing this as it has not reared it's head in a while.
Seen again nightly 02/18/18 VC version: 6.0 Test suite: 5-3-Enhanced-Linked-Mode
During vic-machine create:
Feb 19 2018 06:20:02.596Z ERROR op=15740.1: License check FAILED on hosts:
Feb 19 2018 06:20:02.597Z ERROR op=15740.1: "\"/ha-datacenter/host/cls/10.160.30.137\" - license missing feature \"serialuri\""
Feb 19 2018 06:20:02.597Z ERROR op=15740.1: "\"/ha-datacenter/host/cls/10.160.30.137\" - license missing feature \"dvs\""
In the last 4 failing runs that have the license missing feature error, the govc license.ls
command targeted at the vCenter shows:
${license} = govc: SecurityError
A normal run shows:
${license} = Key: Edition: Used: Total:
00000-00000-00000-00000-00000 eval 0 0
also in lib/install/validate/config.go
vic-machine is using the same api to check for license features as govc uses.
Seen in nightly test vsphere 6.0
5-3-ELM:
5-3-Enhanced-Linked-Mode.zip