Closed an0nz closed 2 years ago
For anyone else stuck with ERROR :: None
After some more digging this one was all on me, there was no external internet access from the control plane so it was failing.
When I tested internet access it was from another VM within the same subnet but the SNAT rule on the Edge Gateway did not include the IP range being assigned to the k8s cluster and I had not spotted the smaller mask when checking.
Describe the bug
TKGm cluster deployment fails to create the control plane node and rolls back. Tests have been conducted with a variety of oVDC allocation pool types and different resource allocations from tenant admin and sys admin accounts with the below TKGm templates all failing the same way ubuntu-2004-kube-v1.21.2 ubuntu-2004-kube-v1.20.8 ubuntu-2004-kube-v1.19.12
The test routed network being deployed to has been configured with internet access and a completely open firewall just to eliminate connectivity issues with no luck.
The control plane VM is deployed, receives an IP then hangs on
Post guest customization phase guestinfo.postcustomization.kubeinit.status is in_progress
and eventually fails with the below error before rolling back.If anyone can help either find my mistake, resolve the bug, or provide a workaround it would be greatly appreciated.
22-01-13 15:32:33 | pyvcloud_utils:758 - wait_for_completion_of_post_customization_procedure | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | INFO :: Post guest customization phase guestinfo.postcustomization.networkconfiguration.status is successful 22-01-13 15:32:34 | pyvcloud_utils:758 - wait_for_completion_of_post_customization_procedure | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | INFO :: Post guest customization phase guestinfo.postcustomization.store.sshkey.status is successful 22-01-13 15:32:35 | pyvcloud_utils:758 - wait_for_completion_of_post_customization_procedure | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | INFO :: Post guest customization phase guestinfo.postcustomization.kubeinit.status is in_progress .... .... 22-01-13 15:37:02 | pyvcloud_utils:758 - wait_for_completion_of_post_customization_procedure | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | INFO :: Post guest customization phase guestinfo.postcustomization.kubeinit.status is in_progress 22-01-13 15:37:02 | pyvcloud_utils:767 - wait_for_completion_of_post_customization_procedure | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | ERROR :: VM Post guest customization script failed with error:Thu 13 Jan 2022 02:13:13 AM UTC 130 /root/.customization/customize.sh: kubeadm init --config $kubeadm_config_path --v=10 > /root/kubeadm-init.out 22-01-13 15:13:26 | cluster_service_2_x_tkgm:2163 - _add_control_plane_nodes | Request Id: ddc300db-36b0-4666-b8c5-900a73234035 | ERROR :: None Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 2151, in _add_control_plane_nodes logger=LOGGER File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/common/utils/pyvcloud_utils.py", line 768, in wait_for_completion_of_post_customization_procedure raise exceptions.ScriptExecutionError container_service_extension.exception.exceptions.ScriptExecutionError: None 22-01-13 15:13:26 | cluster_service_2_x_tkgm:858 - _create_cluster_async | Request Id: ddc300db-36b0-4666-b8c5-900a73234035 | ERROR :: failure on creating nodes ['mstr-u3o4'] Error:None Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 2151, in _add_control_plane_nodes logger=LOGGER File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/common/utils/pyvcloud_utils.py", line 768, in wait_for_completion_of_post_customization_procedure raise exceptions.ScriptExecutionError container_service_extension.exception.exceptions.ScriptExecutionError: None
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 855, in _create_cluster_async refresh_token=refresh_token File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 2171, in _add_control_plane_nodes raise exceptions.NodeCreationError(node_list, str(err)) container_service_extension.exception.exceptions.NodeCreationError: failure on creating nodes ['mstr-u3o4'] Error:None 22-01-13 15:13:26 | cluster_service_2_x_tkgm:954 - _create_cluster_async | Request Id: ddc300db-36b0-4666-b8c5-900a73234035 | ERROR :: Error creating cluster 'k8s-tkg-test' Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 2151, in _add_control_plane_nodes logger=LOGGER File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/common/utils/pyvcloud_utils.py", line 768, in wait_for_completion_of_post_customization_procedure raise exceptions.ScriptExecutionError container_service_extension.exception.exceptions.ScriptExecutionError: None
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 855, in _create_cluster_async refresh_token=refresh_token File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 2171, in _add_control_plane_nodes raise exceptions.NodeCreationError(node_list, str(err)) container_service_extension.exception.exceptions.NodeCreationError: failure on creating nodes ['mstr-u3o4'] Error:None
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 860, in _create_cluster_async f"Error adding control plane node: {err}") container_service_extension.exception.exceptions.ControlPlaneNodeCreationError: Error adding control plane node: failure on creating nodes ['mstr-u3o4'] Error:None
Reproduction steps
Expected behavior
Cluster deploys or returns an error detailing what went wrong, in this case Error:None is returned.
Additional context
This also seems to cause some further errors in vCD as after the rollback completes the below errors appear along with the vCD cell throwing an email alert stating
VMware Cloud Director cell with IP address x.x.x.x is successful in reconnecting to the database .
The error is longer than below but is basically a loop of the same.22-01-13 15:37:32 | cluster_service_2_x_tkgm:1714 - _update_task | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | DEBUG :: Sending behavior response:{'type': 'BEHAVIOR_RESPONSE', 'headers': {'taskId': 'b0155be0-a2eb-4227-8eac-4b56410efe7e', 'entityId': 'urn:vcloud:entity:cse:nativeCluster:20e9c5d2-060d-49ab-a412-a06f7f442b98', 'contentType': 'application/vnd.vmware.vcloud.task+json'}, 'payload': '{"status": "error", "error": {"majorErrorCode": "500", "minorErrorCode": "Error creating cluster \'k8s-tkg-test\'. Deleting cluster (rollback=True)", "message": "Error adding control plane node: failure on creating nodes [\'mstr-kqss\']\nError:None"}}'} 22-01-13 15:37:32 | mqtt_publisher:116 - send_response | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | DEBUG :: publish return (rc, msg_id): (0, 227) 22-01-13 15:37:33 | mqtt_consumer:117 - on_disconnect | INFO :: MQTT disconnect with reason: 1 22-01-13 15:37:33 | operation_context:134 - end | Request Id: dd4012df-0b01-499b-b41d-f2b317c1fe4d | DEBUG :: Failed to logout user: svc_cseat api_version: Default api version. Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/security/context/operation_context.py", line 128, in end sysadmin_user_context.client.logout() File "/opt/cse/python/lib/python3.7/site-packages/pyvcloud/vcd/client.py", line 1140, in logout result = self._do_request('DELETE', uri) File "/opt/cse/python/lib/python3.7/site-packages/pyvcloud/vcd/client.py", line 1260, in _do_request _objectify_response(response, objectify_results)) File "/opt/cse/python/lib/python3.7/site-packages/pyvcloud/vcd/client.py", line 725, in _objectify_response return objectify.fromstring(response.content) File "src/lxml/objectify.pyx", line 1998, in lxml.objectify.fromstring File "src/lxml/etree.pyx", line 3252, in lxml.etree.fromstring File "src/lxml/parser.pxi", line 1912, in lxml.etree._parseMemoryDocument File "src/lxml/parser.pxi", line 1800, in lxml.etree._parseDoc File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError File "", line 11
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 9 and head, line 11, column 8
22-01-13 15:37:34 | mqtt_consumer:100 - on_connect | INFO :: MQTT client connected with result code 0 and flags {'session present': 1}
22-01-13 15:37:34 | mqtt_consumer:114 - on_subscribe | INFO :: MQTT client subscribed with given_qos: (2,)
22-01-13 15:38:58 | mqtt_consumer:50 - process_behavior_message | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | DEBUG :: Received behavior invocation: urn:vcloud:behavior-interface:createCluster:cse:k8s:1.0.0 on entityId:urn:vcloud:entity:cse:nativeCluster:bafa37d7-a0e8-49af-b8db-7faa57635c06 with requestId: f846c470-3f22-49a2-a143-364d9045b88d
22-01-13 15:39:00 | cluster_service_2_x_tkgm:1714 - _update_task | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | DEBUG :: Sending behavior response:{'type': 'BEHAVIOR_RESPONSE', 'headers': {'taskId': '72bb7f6e-cad3-4e24-9b09-e32588a7219d', 'entityId': 'urn:vcloud:entity:cse:nativeCluster:bafa37d7-a0e8-49af-b8db-7faa57635c06', 'contentType': 'application/vnd.vmware.vcloud.task+json'}, 'payload': '{"status": "running", "operation": "Creating cluster \'k8s-tkg-test\' from template \'ubuntu-2004-kube-v1.19.12+vmware.1-tkg.2-17499219411874486722\' (revision 1)"}'}
22-01-13 15:39:00 | mqtt_publisher:116 - send_response | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | DEBUG :: publish return (rc, msg_id): (0, 229)
22-01-13 15:39:00 | entity_service:56 - exception_handler_wrapper | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | ERROR :: [ 1965f36f-1a1f-4686-be55-36c91eb7bb18 ] This operation is denied.
Traceback (most recent call last):
File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 52, in exception_handler_wrapper
result = func(*args, *kwargs)
File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 385, in get_entity
resource_url_relative_path=f"{CloudApiResource.ENTITIES}/"
File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/lib/cloudapi/cloudapi_client.py", line 139, in do_request
response.raise_for_status()
File "/opt/cse/python/lib/python3.7/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://vcd.mydomain.com/cloudapi/1.0.0/entities/urn:vcloud:entity:cse:nativeCluster:bafa37d7-a0e8-49af-b8db-7faa57635c06
22-01-13 15:39:00 | entity_service:61 - exception_handler_wrapper | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | ERROR :: [ 1965f36f-1a1f-4686-be55-36c91eb7bb18 ] This operation is denied.
22-01-13 15:39:00 | cluster_service_2_x_tkgm:289 - create_cluster | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | ERROR :: Error updating the cluster 'k8s-tkg-test' with the status
22-01-13 15:39:00 | cluster_service_2_x_tkgm:300 - create_cluster | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | ERROR :: Failed to create cluster k8s-tkg-test in org NOCTest-PMD and VDC NOC-AKLD-PMD
Traceback (most recent call last):
File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 52, in exception_handler_wrapper
result = func(args, **kwargs)
File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 385, in get_entity
resource_url_relative_path=f"{CloudApiResource.ENTITIES}/"
File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/lib/cloudapi/cloudapi_client.py", line 139, in do_request
response.raise_for_status()
File "/opt/cse/python/lib/python3.7/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://vcd.mydomain.com/cloudapi/1.0.0/entities/urn:vcloud:entity:cse:nativeCluster:bafa37d7-a0e8-49af-b8db-7faa57635c06
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 286, in create_cluster curr_rde = self._update_cluster_entity(entity_id, changes=changes) # noqa: E501 File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x_tkgm.py", line 1693, in _update_cluster_entity cluster_id, invoke_hooks=False, changes=changes File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 62, in exception_handler_wrapper raise error File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 52, in exception_handler_wrapper result = func(*args, **kwargs) File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 325, in update_entity entity, etag = self._form_updated_entity(entity_id, changes) File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 262, in _form_updated_entity entity = self.get_entity(entity_id) File "/opt/cse/python/lib/python3.7/site-packages/container_service_extension/rde/common/entity_service.py", line 59, in exception_handler_wrapper minor_error_code=MinorErrorCode.DEFAULT_ERROR_CODE) container_service_extension.exception.exceptions.DefEntityServiceError: [ 1965f36f-1a1f-4686-be55-36c91eb7bb18 ] This operation is denied. 22-01-13 15:39:00 | entity_service:56 - exception_handler_wrapper | Request Id: f6d9e84d-20c6-44a2-ab30-b363d8f020ae | ERROR :: [ 5422347b-124e-4698-b3bc-62b19be77684 ] This operation is denied.