Open dnoliver opened 5 years ago
This looks like a similar https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/ts_imagestore_error.html error. But in this case, the error message is different, and I did not assigned any container to run yet.
I have manually browser to the datastore1 folder, and created the images folder in there. At that point, the Docker Personality log reported success
Apr 2 2019 18:52:56.080Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:500 Internal Server Error}
time="2019-04-02T18:52:59Z" level=info msg="Launching docker personality pprof server on 127.0.0.1:6062"
Apr 2 2019 18:52:59.356Z ERROR Unable to load CAs for registry access in config
Apr 2 2019 18:52:59.356Z INFO Waiting for portlayer to come up
Apr 2 2019 18:53:01.358Z INFO Portlayer is up and responding to pings
Apr 2 2019 18:53:01.358Z INFO Refreshing repository cache
Apr 2 2019 18:53:01.360Z INFO Image cache initialized successfully
Apr 2 2019 18:53:01.360Z INFO Repository cache updated successfully
Apr 2 2019 18:53:01.360Z INFO Layer cache initialized successfully
Apr 2 2019 18:53:01.361Z INFO Container cache updated successfully
Apr 2 2019 18:53:01.361Z INFO Creating image store
Apr 2 2019 18:53:01.362Z INFO TLS enabled
Apr 2 2019 18:53:01.363Z INFO Listener created for HTTP on 192.168.0.110//tcp
Apr 2 2019 18:53:01.379Z INFO API listen on 192.168.0.110:2376
But then I restarted the host, and my images folder previously created was removed again. Apparently, that folder is managed by the VCH host, and will run into the same problem after every reboot
After this workaround, I was able to add the host to the project! :)
But I am still having storage related problems. Trying to deploy a container in fails with the following error:
Retries are prevented. Failure: Error: Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}; Reason: {"errorDetail":{"message":"Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}"},"error":"Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}"}
Tried destroying the VCH host created, and create a new one. I run into the same issue. I cannot add the host to a project because of the same error, and after applying the workaround it allows me to do it. But then, I cannot run a container into the host because of the same error in https://github.com/vmware/vic-product/issues/2413#issuecomment-479134653
@dnoliver We met the similar issues before when the VC user or the opsuser you use to create VCH do not have the privilege to create the datastore folder. Is that your case?
I followed this guide to create the vic-ops user https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/create_ops_user.html
In the https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/set_up_ops_user.html docs, I saw:
Grant Any Necessary Permissions The operations user account must exist before you create a VCH. If you are deploying the VCH to a cluster, vSphere Integrated Containers Engine can configure the operations user account with all of the necessary permissions for you.
IMPORTANT: The option to grant any necessary permissions automatically only applies when deploying VCHs to clusters. If you are deploying the VCH to a standalone host that is managed by vCenter Server, you must configure the operations user account manually. For information about manually configuring the operations user account, see Manually Create a User Account for the Operations User.
I think I am doing a standalone host deployment, so maybe I have to rather change that to the Cluster deployment, or follow the https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/ops_user_manual.html to assign permissions to that user if I want to do the standalone host deployment.
I will try that and report results backs. Thank you @wjun!
I have definitively a datastore permissions problem for my vic-ops user :) thank you for the hint @wjun
My VCH - endpoint - datastore
permission looks like this:
dvPort group Modify Policy operation Scope operation
Datastore Allocate space Browse datastore Configure datastore Low level file operations Remove file
Host Configuration System Management
Resource Assign virtual machine to resource pool Migrate powered off virtual machine
Virtual machine Change Configuration Add existing disk Add new disk Add or remove device Advanced configuration Modify device settings Remove disk Rename Edit Inventory Create new Register Remove Unregister Guest operations Guest operation modifications Guest operation program execution Guest operation queries Interaction Connect devices Power off Power on
Then, the more accurate question will be: why do vic-ops user run into datastore permission problems while creating VCH and/or running containers, if it have all the permissions specified by the documentation?
I did the same deployment, but now using a cluster, and the problem is still there. I have to manually create the images
folder to make the VCH host work for the first time, and I need to add administrative role to vic-user on the datastore to make a successful container deployment. So this permission problem happens regardless of doing a cluster or standalone host deployment
I tried VCH create from CLI onto a VC cluster, and it works. Please note --user is an admin user. --ops-user must be combined with --ops-grant-perms so VCH can assign related permissions to this ops-user automatically.
Great @wjun, I have only tested this with the UI Wizard, where I think the --user administrator@vsphere.local
is implicit (that is the user I use for log in into vCenter Server). I will give a shot to the CLI command to validate. Thanks!
@wjun I have validated the API approach, and I run into the same issue again.
The command used to deploy this VCH was
./vic-machine-linux create --name virtual-container-host-1 \
--compute-resource Cluster \
--image-store 'datastore1 (1)' \
--base-image-size 8GB
--volume-store 'datastore1 (1):default' \
--bridge-network vic-bridge \
--bridge-network-range 172.16.0.0/12 \
--public-network 'VM Network' \
--tls-cname virtual-container-host-1 \
--certificate-key-size 2048 \
--no-tlsverify --user Administrator@VSPHERE.LOCAL \
--thumbprint <thumb>
--target 192.168.0.238/Datacenter
--ops-user vic-ops@vsphere.local
--ops-grant-perms
The command execution log is below:
INFO[0000] ### Installing VCH ####
INFO[0000] vSphere password for vic-ops@vsphere.local:
INFO[0003] Loaded server certificate virtual-container-host-1/server-cert.pem
WARN[0003] Configuring without TLS verify - certificate-based authentication disabled
INFO[0003] Validating supplied configuration
INFO[0004] Network configuration OK on "vic-bridge"
INFO[0004] vCenter settings check OK
INFO[0004] Firewall status: ENABLED on "/Datacenter/host/Cluster/192.168.0.217"
INFO[0004] Firewall configuration OK on hosts:
INFO[0004] "/Datacenter/host/Cluster/192.168.0.217"
INFO[0004] vCenter settings check OK
INFO[0004] License check OK on hosts:
INFO[0004] "/Datacenter/host/Cluster/192.168.0.217"
INFO[0004] DRS check OK on:
INFO[0004] "/Datacenter/host/Cluster"
WARN[0004] Only one host can access all of the image/volume datastores. This may be a point of contention/performance degradation and HA/DRS may not work as intended.
INFO[0004]
INFO[0005] Creating Resource Pool "virtual-container-host-1"
INFO[0005] Creating appliance on target
INFO[0005] Network role "client" is sharing NIC with "public"
INFO[0005] Network role "management" is sharing NIC with "public"
INFO[0005] Creating the VCH folder
INFO[0005] Creating the VCH VM
INFO[0006] Creating directory [datastore1 (1)] VIC
INFO[0006] Datastore path is [datastore1 (1)] VIC
INFO[0007] Uploading ISO images
INFO[0008] Uploading appliance.iso as V1.5.2-20879-30B67A14-appliance.iso
INFO[0027] Uploading bootstrap.iso as V1.5.2-20879-30B67A14-bootstrap.iso
INFO[0045] Waiting for IP information
INFO[0052] Waiting for major appliance components to launch
INFO[0052] Obtained IP address for client interface: "192.168.0.199"
INFO[0052] Checking VCH connectivity with vSphere target
INFO[0052] vSphere API Test: https://192.168.0.238 vSphere API target responds as expected
ERRO[0225] vic/lib/install/management.(*Dispatcher).CheckDockerAPI: Create error: context deadline exceeded
vic/cmd/vic-machine/create.(*Create).Run:755 Create
vic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux
INFO[0225] Docker API endpoint check failed: context deadline exceeded
INFO[0225] Collecting 598fc05d-88d3-4d9b-8c5a-f55a274e2db1 vpxd.log
INFO[0225] API may be slow to start - try to connect to API after a few minutes:
INFO[0225] Run command: docker -H 192.168.0.199:2376 --tls info
INFO[0225] If command succeeds, VCH is started. If command fails, VCH failed to install - see documentation for troubleshooting.
ERRO[0225] vic/cmd/vic-machine/create.(*Create).Run.func3: Create error: context deadline exceeded
vic/cmd/vic-machine/create.(*Create).Run:755 Create
vic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux
ERRO[0225] --------------------
ERRO[0225] vic-machine-linux create failed: Creating VCH exceeded time limit of 3m0s. Please increase the timeout using --timeout to accommodate for a busy vSphere target
At least this time I have an error! and not the silent error that the UI Wizard run. In the Docker personality log, I can see the same issue as before:
Apr 8 2019 20:26:07.096Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:cannot stat '[datastore1 (1)] virtual-container-host-1/VIC/423b9844-dafc-a118-d910-f6ce4a309745/images': No such file}
And I am sure the workaround still apply. If I create the /images
directory manually, and assign admin permissions in the datastore to the vic-ops users, this will start working.
I also tried to deploy a VCH keeping the Administrator access for vic-ops in the datastore, and removing the --ops-grant-perms
parameter, and it runs into the same issue as before:
Apr 8 2019 20:26:07.096Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:cannot stat '[datastore1 (1)] virtual-container-host-1/VIC/423b9844-dafc-a118-d910-f6ce4a309745/images': No such file}
So this /images
folder error could be not a permissions problem (at least in the datastore), but something else. The Administrator permissions seems to solve the second issue when trying to run a container, but not the initial creation of the images folder.
@dnoliver I tried various combinations of ops-user and datastores, and cannot reproduce in my local env. Could you post your portlayer.log as well where there should be error messages related to images directory creation failure? Another option is to remove --ops-user and --ops-grant-perms during VCH create first and see if you can reproduce the issue or not.
Ok, will try to share the portlayer.log file.
The only special thing about my installation is that it is using VM Encryption. I have a KMS, and encryption storage policy, and a couple of encrypted VMs running in the same host. Is that something relevant to this issue?
I hate to comment on an old thread, but I have vSAN encryption with vCenter KMS, and experienced the same problem with the 'grant all permissions needed' option, and needing to create the images folder manually for this to work. So this still seems to be an issue.
Summary
After following the documentation in https://vmware.github.io/vic-product/assets/files/html/1.5, I cannot add a host to a project.
Admiral cannot communicate with the VCH instance VCH instance logs show errors while trying to stat datastore
Environment information
vSphere 6.7 Single ESXi host 6.7 vCenter Server appliance with embedded Platform controller 6.7 VIC 1.5 VCH deployed with UI Wizard one single datastore a bridge network created with virtual switch default VM Network as public network
vSphere and vCenter Server version
vSphere and vCenter 6.7 update 1
VIC Appliance version
vic-v1.5.2-7206-92ebfaf5
Configuration
Details
Was following the documentation step by step to deploy the first VCH host. VCH host is deployed successfully. vic-machine-linux ls shows the host All green checks in VCH admin portal Used the default-project in admiral, tried to add the VCH host to default-project No TLS being used. Tried to add the host:
Using http since the docs say that use http with no TLS. tried several combinations, none of the works.
Changed type from VCH to DOCKER, received error 500.
Inspect logs in VCH admin portal. Several ERROR messages (but UI have all green checks)
Docker Personality log show several times
Port Layer showing the same:
No problems in Init log
VIC Admin log show same error several times:
Steps to reproduce
Follow docs to deploy VIC, create VCH Assign VCH to default-project in Admiral
Actual behavior
Cannot establish connection error
Expected behavior
VCH should be added to default-project
Support information
Logs
Not comfortable with posting publicly, private channel is ok
See also
Troubleshooting attempted