Open ChandraRatra opened 4 years ago
Once run below command, it completes without any error
Required Python version: >= 3.7.3 Installed Python version: 3.7.3 (default, Aug 1 2020, 08:50:56) [GCC 7.3.0] Password for config file decryption: Decrypting 'encrypted-config.yaml' Validating config file 'encrypted-config.yaml' Connected to AMQP server (X.X.X.X:5672) InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. Connected to vCloud Director (X.X.X.X:443) Connected to vCenter Server 'X.X.X.X' as 'administrator@vsphere.local' (X.X.X.X:443) Config file 'encrypted-config.yaml' is valid Loading k8s template definition from catalog Found K8 template 'photon-v2_k8-1.14_weave-2.5.2' at revision 2 in catalog 'cse261' Found K8 template 'ubuntu-16.04_k8-1.15_weave-2.5.2' at revision 3 in catalog 'cse261' Found K8 template 'ubuntu-16.04_k8-1.16_weave-2.6.0' at revision 1 in catalog 'cse261' Found K8 template 'ubuntu-16.04_k8-1.17_weave-2.6.0' at revision 1 in catalog 'cse261' Processing compute policy for k8s templates. Removing compute policy from template 'photon-v2_k8-1.14_weave-2.5.2_rev2'. Removing compute policy from template 'ubuntu-16.04_k8-1.15_weave-2.5.2_rev3'. Removing compute policy from template 'ubuntu-16.04_k8-1.16_weave-2.6.0_rev1'. Removing compute policy from template 'ubuntu-16.04_k8-1.17_weave-2.6.0_rev1'. Validating CSE installation according to config file AMQP exchange 'CSE' exists CSE on vCD is currently enabled Found catalog 'cse261' CSE installation is valid Started thread 'MessageConsumer-0 (139737667303168)' Started thread 'MessageConsumer-1 (139737658648320)' Started thread 'MessageConsumer-2 (139737650255616)' Started thread 'MessageConsumer-3 (139737641862912)' Started thread 'MessageConsumer-4 (139737432061696)' Started thread 'MessageConsumer-5 (139737423668992)' Started thread 'MessageConsumer-6 (139737415276288)' Started thread 'MessageConsumer-7 (139737406883584)' Started thread 'MessageConsumer-8 (139737398490880)' Started thread 'MessageConsumer-9 (139737390098176)' Container Service Extension for vCloud Director Server running using config file: encrypted-config.yaml Log files: cse-logs/cse-server-info.log, cse-logs/cse-server-debug.log waiting for requests (ctrl+c to close)
When tried running CSE Server as a Service, got error: Failed to start Container Service Extension for VMware vCloud Director.
● cse.service - Container Service Extension for VMware vCloud Directo r Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2020-08-02 04:43:54 UTC; 567msago Main PID: 738 (bash) Tasks: 2 (limit: 2394) Memory: 32.3M CGroup: /system.slice/cse.service ├─738 bash /home/vmware/cse.sh └─739 /usr/local/bin/python3.7 /usr/local/bin/cse run --config /home/ vmware/encrypted-config.yaml
Aug 02 04:43:54 systemd[1]: cse.service: Service RestartSec=100ms expired, scheduling restart. Aug 02 04:43:54 systemd[1]: cse.service: Scheduled restart job, restart counter is at 4. Aug 02 04:43:54 systemd[1]: Stopped Container Service Extension for VMware vCloud Director. Aug 02 04:43:54 systemd[1]: Started Container Service Extension for VMware vCloud Director.
● cse.service - Container Service Extension for VMware vCloud Director Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Sun 2020-08-02 04:43:55 UTC; 24s ago Process: 738 ExecStart=/home/vmware/cse.sh (code=exited, status=1/FAILURE) Main PID: 738 (code=exited, status=1/FAILURE)
Aug 02 04:43:54 systemd[1]: cse.service: Main process exited, code=exited, status=1/FAILURE Aug 02 04:43:54 systemd[1]: cse.service: Failed with result 'exit-code'. Aug 02 04:43:55 systemd[1]: cse.service: Service RestartSec=100ms expired, scheduling restart. Aug 02 04:43:55 systemd[1]: cse.service: Scheduled restart job, restart counter is at 5. Aug 02 04:43:55 systemd[1]: Stopped Container Service Extension for VMware vCloud Director. Aug 02 04:43:55 systemd[1]: cse.service: Start request repeated too quickly. Aug 02 04:43:55 systemd[1]: cse.service: Failed with result 'exit-code'. Aug 02 04:43:55 systemd[1]: Failed to start Container Service Extension for VMware vCloud Director.
After disabling proxy on 2nd nic, systemd-networkd-wait-online.service is now green. But still cse.service is getting failed to start.
Observed same issue, after upgrading CSE 2.5.1 to CSE2.6.1
Just to clear manually able to start cse service, only issue is when try to run CSE Server as a Service.
Hello,
Can you please confirm if the cse.service
file has references to /home/vmware/cse.sh
or /root/cse.sh
?
cse.service file has references to /home/vmware/cse.sh
-cat cse.service [Unit] Description=Container Service Extension for VMware vCloud Director Wants=network-online.target After=network-online.target
[Service] ExecStart=/home/vmware/cse.sh Type=simple User=root WorkingDirectory=/home/vmware Restart=always
and below is the details of cse.sh file
In case of new installation of CSE 2.6.1 Issue 2# I had manually started cse service and tried to deploy Photon and Ubuntu template. In both cases I am getting below error. In both cases master is deployed and getting error on worker node. Just to clear I don't have direct internet connectivity to CSE server, I am using proxy for internet connectivity. Seems like something is missing from node.sh. Can you please look into this issue also
Photon cluster deployment started using template- photon-v2_k8-1.14_weave-2.5.2 . Task failed with below error message
cluster operation: Error creating cluster 'PH-CSECLS01'. Deleting cluster (rollback=True) task: 928d3c83-415e-4c6a-a5ce-aa10715f41d7, result: error, message: Join cluster script execution failed on worker node ['node-e09n']:["/tmp/0b46a1fa-d785-11ea-b009-005056010df3.sh: line 5: $'\240': command not found\n"]
Below is the node.sh file for photon template (X.X.X.X = proxy IP address) cat node.sh
set -e
mkdir /etc/systemd/system/docker.service.d echo '[Service]' >> /etc/systemd/system/docker.service.d/http-proxy.conf echo 'Environment="HTTP_PROXY=http://X.X.X.X:8080"' >> /etc/systemd/system/docker.service.d/http-proxy.conf echo '[Service]' >> /etc/systemd/system/docker.service.d/https-proxy.conf echo 'Environment="HTTPS_PROXY=http://X.X.X.X:8080"' >> /etc/systemd/system/docker.service.d/https-proxy.conf systemctl daemon-reload systemctl restart docker
while [ systemctl is-active docker
!= 'active' ]; do echo 'waiting for docker'; sleep 5; done
kubeadm join --token {token} {ip}:6443 --discovery-token-unsafe-skip-ca-verification
Below is the details from cse-server-info.log
20-08-06 01:11:33 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:11:40 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:11:56 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:13:08 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:29:11 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:29:30 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:30:53 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:31:18 | WARNING :: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 20-08-06 01:35:14 | INFO :: Error creating cluster 'PH-CSECLS01'. Deleting cluster (rollback=True) 20-08-06 01:35:36 | ERROR :: Error creating cluster 'PH-CSECLS01' Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/container_service_extension/vcdbroker.py", line 901, in _create_cluster_async template[LocalTemplateKey.REVISION]) File "/usr/local/lib/python3.7/site-packages/container_service_extension/vcdbroker.py", line 1774, in join_cluster f"Join cluster script execution failed on worker node " container_service_extension.exceptions.ScriptExecutionError: Join cluster script execution failed on worker node ['node-e09n']:["/tmp/0b46a1fa-d785-11ea-b009-005056010df3.sh: line 5: $'\240': command not found\n"]
Ubuntu cluster deployment started using template ubuntu-16.04_k8-1.17_weave-2.6.0 . Task failed with below error message
Error creating cluster 'UB-CSECLS01'. Deleting cluster (rollback=True) task: 026c40ff-2de8-42e6-bf3c-ea444661f3da, result: error, message: Join cluster script execution failed on worker node ['node-8rxn']:["/tmp/cca0838c-d70a-11ea-b3a6-005056010df3.sh: line 4: $'\240': command not found\n"]
In case of upgrade from CSE from 2.5.1 to 2.6.1. Even after manually starting CSE service, not able to deploy CSE cluster from command line Issue 3# Getting below error message vcd cse cluster create PH-CSCLS02 --template-name photon-v2_k8-1.14_weave-2.5.2 --template-revision 2 --nodes 1 --network ORG-NW05 Usage: vcd cse cluster create [OPTIONS] NAME Try "vcd cse cluster create -h" for help.
Error: External service 'cse' failed to respond in the specified timeout (40 SECONDS)
And when tried to ADD cluster from VCD UI, Create New Cluster page keeps on a spinning wheel
Issue 1: cannot run cse as a service Can you please double check if the paths mentioned in cse.service and cse.sh files are correct. The cse.sh is using an encrypted config without a password variable set. You can take a look at the example cse.sh in https://github.com/vmware/container-service-extension/blob/master/cse.sh to set password environment variable.
Issue 2: problem with node.sh: I think the node.sh you are using has been modified. and the error is also in the line that was added. I guess we can't help there as the supported node.sh was modified.
Issue 3: cluster create not going through Can you please give the steps followed to start the CSE server?
Issue1: Sure I will update cse.sh and cse.service and than will confirm back
Issue 2: In my environment internet connectivity is available through proxy If I use default node.sh
set -e
while [ systemctl is-active docker
!= 'active' ]; do echo 'waiting for docker'; sleep 5; done
kubeadm join --token {token} {ip}:6443 --discovery-token-unsafe-skip-ca-verification
Cluster get deployed, but node status is not ready
kubectl get nodes
NAME STATUS ROLES AGE VERSION
mstr-s501 Ready master 30m v1.14.6
node-2mz7 NotReady
Issue 3: Steps used to start CSE server manually cse run --config encrypted-config.yaml
Regarding Issue 1: I updated details of cse.sh & cse.service file for my upgrade/new installation. Now I am able to start CSE server as service
Below is details from cse.sh
root@CSEVM251 [ ~ ]# cat cse.sh CSE_CONFIG_PATH=/root/encrypted-config.yaml cse run --config $CSE_CONFIG_PATH
below is details from cse.service
cat /etc/systemd/system/cse.service [Unit] Description=Container Service Extension for VMware vCloud Director Wants=network-online.target,rabbitmq-server.service After=network-online.target,rabbitmq-server.service
[Service] ExecStart=/root/cse.sh User=root WorkingDirectory=/root Type=simple Restart=always EnvironmentFile=/home/vmware/CSE_CONFIG_PASSWORD
[Install] WantedBy=default.target
Now EnvironmentFile=/home/vmware/CSE_CONFIG_PASSWORD is a plain text file and anyone with access to CSE VM can get the password to decrypt config.yaml
can you please suggest how to secure EnvironmentFile ??
Issue 2: In my environment internet connectivity is available through proxy If I use default node.sh
set -e while [ systemctl is-active docker != 'active' ]; do echo 'waiting for docker'; sleep 5; done kubeadm join --token {token} {ip}:6443 --discovery-token-unsafe-skip-ca-verification
Cluster get deployed, but node status is not ready kubectl get nodes NAME STATUS ROLES AGE VERSION mstr-s501 Ready master 30m v1.14.6 node-2mz7 NotReady 23m v1.14.6
Issue 4: After upgrade CSE from 2.5.1 to 2.6.1 version. Tried to update existing cluster from CSE 2.5.1, but task failed
]# vcd cse cluster upgrade PH-CSECLS01 photon-v2_k8-1.14_weave-2.5.2 2 cluster operation: Upgrading cluster 'PH-CSECLS01' software to match template photon-v2_k8-1.14_weave-2.5.2 (revision 2): Kubernetes: 1.14.6 -cluster operation: Upgrading cluster 'PH-CSECLS01' software to match template photon-v2_k8-1.14_weave-2.5.2 (revision 2): Kubernetes: 1.14.6 -> 1.14.6, Docker-CE: 18.06.2 -> 18.06.2-6, CNI: weave 2.5.2 -> 2.5.2 cluster operation: Draining master node ['mstr-ij3f'] cluster operation: Upgrading Kubernetes (1.14.6 -> 1.14.6) in master node ['mstr-ij3f'] cluster operation: Uncordoning master node ['mstr-ij3f'] cluster operation: Draining node node-14wi task: 7425b108-9cfc-48af-8849-36436c7fe115, result: error, message: Unexpected error while upgrading cluster 'PH-CSECLS01': Script execution failed on node ['mstr-ij3f'] Errors: ['Error from server (NotFound): nodes "node-14wi" not found\n']
Where as node is visible from command line and UI also ]# vcd cse node list PH-CSECLS01 ipAddress name
192.168.0.203 mstr-ij3f 192.168.0.202 node-14wi
]# vcd cse cluster info PH-CSECLS01 property value
cluster_id 960a7475-c0fe-4447-906b-77dc438c5a3c cni weave cni_version 2.5.2 cse_version 2.5.1 docker_version 18.06.2 k8s_provider native kubernetes upstream kubernetes_version 1.14.6 leader_endpoint 192.168.0.203 master_nodes {'name': 'mstr-ij3f', 'ipAddress': '192.168.0.203'} name PH-CSECLS01 nfs_nodes nodes {'name': 'node-14wi', 'ipAddress': '192.168.0.202'} number_of_vms 2 os photon-v2 status POWERED_ON template_name photon-v2_k8-1.14_weave-2.5.2 template_revision 1 vapp_href https://vcdlab.lab65.local/api/vApp/vapp-87c8bef0-99d4-44eb-8145-fbba14225040 vapp_id 87c8bef0-99d4-44eb-8145-fbba14225040 vdc_href https://vcdlab.lab65.local/api/vdc/c58af5a0-b002-43c5-8f66-8152f21e1a19 vdc_id c58af5a0-b002-43c5-8f66-8152f21e1a19 vdc_name ORG05-VDC
can you please suggest how to secure EnvironmentFile ?? as currently it is plain text file
For Issue 2# can you please suggest regarding changes I have to make in node.sh. considering my scenario, where internet connectivity is available through proxy.
Issue 1# I am installing CSE 2.6.1, template deployment is complete. After that when I tried start cse.services its getting failed
systemctl status cse
● cse.service - Container Service Extension for VMware vCloud Director Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Sat 2020-08-01 18:04:14 UTC; 17s ago Process: 770 ExecStart=/home/vmware/cse.sh (code=exited, status=1/FAILURE) Main PID: 770 (code=exited, status=1/FAILURE)
Aug 01 18:04:14 systemd[1]: cse.service: Main process exited, code=exited, status=1/FAILURE Aug 01 18:04:14 systemd[1]: cse.service: Failed with result 'exit-code'. Aug 01 18:04:14 systemd[1]: cse.service: Service RestartSec=100ms expired, scheduling restart. Aug 01 18:04:14 systemd[1]: cse.service: Scheduled restart job, restart counter is at 5. Aug 01 18:04:14 systemd[1]: Stopped Container Service Extension for VMware vCloud Director. Aug 01 18:04:14 systemd[1]: cse.service: Start request repeated too quickly. Aug 01 18:04:14 systemd[1]: cse.service: Failed with result 'exit-code'. Aug 01 18:04:14 systemd[1]: Failed to start Container Service Extension for VMware vCloud Director.
When I checked list of dependencies for cse.service
systemctl list-dependencies cse.service
getting red dot against systemd-networkd-wait-online.service
Any idea what is causing this issue?