vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
640 stars 173 forks source link

nightly test failed to create cluster - License not available to perform the operation #5438

Closed chengwang86 closed 6 years ago

chengwang86 commented 7 years ago

Seen in nightly test vsphere 6.0

5-3-ELM:

Running command 'govc cluster.create cls 2>&1'.
${out} = govc: ServerFaultCode: License not available to perform the operation.

5-3-Enhanced-Linked-Mode.zip

chengwang86 commented 7 years ago

I think the error occurred here https://github.com/vmware/govmomi/blob/master/vim25/methods/methods.go#L2134 when the govc client tried to ask the vcenter server to create a cluster.

Here is a list of the possible reason for License not available to perform the operation:

chengwang86 commented 7 years ago

The previous cmd govc datacenter.create ha-datacenter (right before the failed cmd) succeeded.

chengwang86 commented 7 years ago

Discussed with @emlin and @mhagen-vmware . We can use govc cmd to update the license of the vcenter server after the testbed is deployed in that particular nightly test.

chengwang86 commented 7 years ago

I deployed a testbed on nimbus using the same cmd as the one used in this test. Here are the licenses on one of the vcenter servers:

govc license.ls
Key:                           Edition:  Used:  Total:
00000-00000-00000-00000-00000  eval      0      0   

govc license.assigned.ls
Id:   Scope:  Name:                                       License:
id1              name1   00000-00000-00000-00000-00000  
id2              name2  00000-00000-00000-00000-00000 
rajanashok commented 7 years ago

Reopening. Seen in Nightly Build 13338.tar.gz on 8/22 5-3-Enhanced-Linked-Mode.zip

chengwang86 commented 7 years ago

The reason to revert my previous fix for issue is that we saw intermittent test failures due to

Running command 'govc license.add .......... 2>&1'.
${out} = govc: ServerFaultCode: Access to perform the operation was denied

The original failure of License not available to perform the operation only occurred twice so far, which makes me wonder if we really need to have more govc cmds to fix this license issue while introducing more potential govc bugs. Of course we can Run Keyword And Ignore Error govc license.add ..., but it doesn't seem to be a good fix.

@mhagen-vmware Any thoughts?

chengwang86 commented 7 years ago

Chatted with @mhagen-vmware about this. We need to create a bugzilla ticket for the nimbus team about this issue.

chengwang86 commented 7 years ago

Here is the response from the VPX/licensing-infrastructure team: as there is no logs and the issue happened once, quite long time ago, there is not much we can do. If it reproduces again, fill free to reopen this bug, adding the relevant logs and reproduction steps. I'm asking them about what specific logs they need for debugging.

chengwang86 commented 7 years ago

@mhagen-vmware @rogeliosanchez The VPX/licensing-infrastructure team wants us to provide vc-support bundle logs the next time this failure occurs. They have closed my bugzilla ticket. Do you think it is worth modifying our test script to provide these support bundles just for such a rare failure?

chengwang86 commented 7 years ago

This PR https://github.com/vmware/vic/pull/6208 has added more log information to the test so that we would be able to know the actual license that is used when the failure occurs.

mhagen-vmware commented 7 years ago

Similar failure on latest nightly as well:

Sep  3 2017 12:04:58.195Z ERROR License check FAILED on hosts:
Sep  3 2017 12:04:58.195Z ERROR   "\"/ha-datacenter/host/cls/10.192.45.42\" - license missing feature \"serialuri\""
Sep  3 2017 12:04:58.195Z ERROR   "\"/ha-datacenter/host/cls/10.192.45.42\" - license missing feature \"dvs\""
Sep  3 2017 12:04:58.195Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).CheckDrs:525]
Sep  3 2017 12:04:58.245Z INFO  DRS check OK on:
Sep  3 2017 12:04:58.245Z INFO    "/ha-datacenter/host/cls"
Sep  3 2017 12:04:58.245Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).CheckDrs:525] [50.221858ms] 
Sep  3 2017 12:04:58.246Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).certificate:476]
Sep  3 2017 12:04:58.247Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).certificate:476] [1.39402ms] 
Sep  3 2017 12:04:58.247Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).certificateAuthorities:495]
Sep  3 2017 12:04:58.247Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).certificateAuthorities:495] [266.884µs] 
Sep  3 2017 12:04:58.247Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).registries:520]
Sep  3 2017 12:04:58.247Z DEBUG URL: https://harbor.ci.drone.local/v2/
Sep  3 2017 12:04:58.248Z DEBUG [BEGIN] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] https://harbor.ci.drone.local/v2/
Sep  3 2017 12:05:08.248Z DEBUG [ END ] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] [10.000395605s] https://harbor.ci.drone.local/v2/
Sep  3 2017 12:05:08.248Z DEBUG URL: http://harbor.ci.drone.local/v2/
Sep  3 2017 12:05:08.248Z DEBUG [BEGIN] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] http://harbor.ci.drone.local/v2/
Sep  3 2017 12:05:08.249Z DEBUG [ END ] [github.com/vmware/vic/pkg/fetcher.(*URLFetcher).Head:380] [664.656µs] http://harbor.ci.drone.local/v2/
Sep  3 2017 12:05:08.249Z WARN  Unable to confirm insecure registry harbor.ci.drone.local is a valid registry at this time.
Sep  3 2017 12:05:08.249Z INFO  Insecure registries = harbor.ci.drone.local
Sep  3 2017 12:05:08.249Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).registries:520] [10.001737688s] 
Sep  3 2017 12:05:08.249Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).compatibility:673]
Sep  3 2017 12:05:08.445Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).checkDatastoresAreWriteable:728]
Sep  3 2017 12:05:08.888Z WARN  Only one host can access all of the image/container/volume datastores. This may be a point of contention/performance degradation and HA/DRS may not work as intended.
Sep  3 2017 12:05:08.888Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).checkDatastoresAreWriteable:728] [442.794289ms] 
Sep  3 2017 12:05:08.888Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).compatibility:673] [639.153322ms] 
Sep  3 2017 12:05:08.888Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).syslog:834]
Sep  3 2017 12:05:08.888Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).syslog:834] [49.123µs] 
Sep  3 2017 12:05:08.888Z DEBUG [BEGIN] [github.com/vmware/vic/lib/install/validate.(*Validator).ListIssues:266]
Sep  3 2017 12:05:08.888Z ERROR --------------------
Sep  3 2017 12:05:08.889Z ERROR License does not meet minimum requirements to use VIC
Sep  3 2017 12:05:08.889Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).ListIssues:266] [141.927µs] 
Sep  3 2017 12:05:08.889Z DEBUG [ END ] [github.com/vmware/vic/lib/install/validate.(*Validator).Validate:291] [13.293048919s] 
Sep  3 2017 12:05:08.889Z ERROR Create cannot continue: configuration validation failed

Date: 09/03 Build: 13555 vsphere 6.0 Test: 5-3-ELM

jzt commented 7 years ago

Seen in 5-3-Enhanced-Linked-Mode.zip

sgairo commented 7 years ago

Missing licenses features in the most recent failures after the added logging were serialuri and dvs. Neither have been missing for the last 3 weeks.

sgairo commented 7 years ago

Seen in 6.0 nightly: 5-3-Enhanced-Linked-Mode.zip

Will follow this up with a bugzilla ticket.

matthewavery commented 7 years ago

It may be possible that we need to reach out to the nimbus folks regarding the licensing of individual features.

sgairo commented 7 years ago

I have reopened the previous issue with bugzilla and attached the latest failure logs.

sgairo commented 7 years ago

In the meantime, will add a retry in case of failure.

sgairo commented 6 years ago

Seen in 6.0 run on 11/19 report: 5-3-Enhanced-Linked-Mode.zip

AngieCris commented 6 years ago

Seen again in 6.0 run on 12/08, test group 5-3-Enhanced-Linked-Mode:

Dec  8 2017 12:54:33.545-06:00 ERROR op=18200.1: License check FAILED on hosts:
Dec  8 2017 12:54:33.546-06:00 ERROR op=18200.1:   "\"/ha-datacenter/host/cls/10.160.110.241\" - license missing feature \"serialuri\""
Dec  8 2017 12:54:33.546-06:00 ERROR op=18200.1:   "\"/ha-datacenter/host/cls/10.160.110.241\" - license missing feature \"dvs\""
......
Dec  8 2017 12:54:44.233-06:00 ERROR op=18200.1: --------------------
Dec  8 2017 12:54:44.233-06:00 ERROR op=18200.1: License does not meet minimum requirements to use VIC
Dec  8 2017 12:54:44.233-06:00 DEBUG [ END ] op=18200.1 [vic/lib/install/validate.(*Validator).ListIssues:271] [96.98µs] 
Dec  8 2017 12:54:44.233-06:00 DEBUG [ END ] op=18200.1 [vic/lib/install/validate.(*Validator).Validate:297] [13.822934637s] 
Dec  8 2017 12:54:44.233-06:00 ERROR op=18200.1: Create cannot continue: configuration validation failed
Dec  8 2017 12:54:44.288-06:00 ERROR op=18200.1: --------------------
Dec  8 2017 12:54:44.288-06:00 ERROR op=18200.1: vic-machine-linux create failed: validation of configuration failed

Log bundle: 5-3-Enhanced-Linked-Mode.zip

mhagen-vmware commented 6 years ago

We need further analysis on this, I have a check now in the setup that uses govc to specifically check for these licenses features and it gets past that without error, so somehow vic-machine is doing the check differently and coming up with different results.

mhagen-vmware commented 6 years ago

Specifically in the case of the host that vic-machine reported didn't have the licenses:

Run govc object.collect -json $(govc object.collect -s - content.licenseManager) licenses | jq '.[].Val.LicenseManagerLicenseInfo[].Properties[] | select(.Key == "feature") | .Value'
BuiltIn . Should Contain ${out}, serialuri  <--- PASS
BuiltIn . Should Contain ${out}, dvs  <--- PASS
mdubya66 commented 6 years ago

We need to better understand how vic-machine is detecting the license compared to the test for it earlier in the robot file. After that we need a way to prevent this from surfacing again. Removing from list of ship stoppers.

mdubya66 commented 6 years ago

Closing this as it has not reared it's head in a while.

AngieCris commented 6 years ago

Seen again nightly 02/18/18 VC version: 6.0 Test suite: 5-3-Enhanced-Linked-Mode

During vic-machine create:

Feb 19 2018 06:20:02.596Z ERROR op=15740.1: License check FAILED on hosts:
Feb 19 2018 06:20:02.597Z ERROR op=15740.1:   "\"/ha-datacenter/host/cls/10.160.30.137\" - license missing feature \"serialuri\""
Feb 19 2018 06:20:02.597Z ERROR op=15740.1:   "\"/ha-datacenter/host/cls/10.160.30.137\" - license missing feature \"dvs\""

vic-machine.log

andrewtchin commented 6 years ago

In the last 4 failing runs that have the license missing feature error, the govc license.ls command targeted at the vCenter shows:

${license} = govc: SecurityError

A normal run shows:

 ${license} = Key:                           Edition:  Used:  Total: 
00000-00000-00000-00000-00000  eval      0      0

also in lib/install/validate/config.go vic-machine is using the same api to check for license features as govc uses.