hashicorp / waypoint

A tool to build, deploy, and release any application on any platform.
https://waypointproject.io
Other
4.76k stars 327 forks source link

Waypoint Runner Install/Uninstall #4481

Open cicoyle opened 1 year ago

cicoyle commented 1 year ago

Describe the bug I tried creating a runner. It errored. I tried to cleanup the resources, but that errored. Then I tried to re-create the same named runner and that errored saying one already existed with the same name. Also, the actual error itself is not helpful - timed out waiting for the condition. Like what condition? I don't know what to change on my end to address that error to be able to successfully install a runner.

$ waypoint runner install -platform=kubernetes \
-server-addr=<server_addr> \
-k8s-runner-image=hashicorp/waypoint:latest \
-k8s-image-pull-policy=always \
-id=eks-int -- -label=selector=eks-int
✓ Finished connecting to: <server_addr>
❌ Installing runner...
❌ Installing Waypoint Helm chart with runner options: waypoint
! Error installing runner: timed out waiting for the condition
Please run the following to clean up the resources from the unsuccessful runner installation,
specifying additional platform flags as needed:

$ waypoint runner uninstall -platform=kubernetes -id=eks-int <additional_platform_flags>

$ waypoint runner uninstall -platform=kubernetes -id=eks-int
❌ Uninstalling runner...
! runner with ID "eks-int" not found in namespace "default" with
  current context

$ waypoint runner install -platform=kubernetes \                           
-server-addr=<server_addr> \
-k8s-runner-image=hashicorp/waypoint:latest \
-k8s-image-pull-policy=always \
-id=eks-int -vvv -- -label=selector=eks-int
2023-02-03T08:03:35.232-0600 [INFO]  waypoint: waypoint version: full_string="v0.10.4 (46720cf33)" version=v0.10.4 prerelease="" metadata="" revision=46720cf33
2023-02-03T08:03:35.232-0600 [TRACE] waypoint: starting interrupt listener for context cancellation
2023-02-03T08:03:35.232-0600 [TRACE] waypoint: interrupt listener goroutine started
2023-02-03T08:03:35.232-0600 [DEBUG] waypoint: home configuration directory: path=/Users/cassiecoyle/Library/Preferences/waypoint
2023-02-03T08:03:35.233-0600 [TRACE] waypoint: no API client provided, initializing connection if possible
2023-02-03T08:03:35.233-0600 [INFO]  waypoint.server: attempting to source credentials and connect
2023-02-03T08:03:35.235-0600 [DEBUG] waypoint.serverclient: connection information: address=<server_addr> tls=true tls_skip_verify=false send_auth=true has_token=true
2023-02-03T08:03:35.591-0600 [DEBUG] waypoint.server: connection established with sourced credentials
2023-02-03T08:03:35.591-0600 [TRACE] waypoint: requesting version info from server
2023-02-03T08:03:35.660-0600 [INFO]  waypoint: server version info: version="hcp v0.10.0" api_min=1 api_current=1 entrypoint_min=1 entrypoint_current=1
2023-02-03T08:03:35.660-0600 [INFO]  waypoint: negotiated api version: version=1
✓ Finished connecting to:<server_addr>
❌ Installing runner...
❌ Installing Waypoint Helm chart with runner options: waypoint
  Waypoint runner service account already exists - a new service account will not be created
! Error installing runner: cannot re-use a name that is still in use
Please run the following to clean up the resources from the unsuccessful runner installation,
specifying additional platform flags as needed:

$ waypoint runner uninstall -platform=kubernetes -id=eks-int <additional_platform_flags>
2023-02-03T08:03:38.249-0600 [TRACE] waypoint: stopping signal listeners and cancelling the context
cassiecoyle@cassiecoyle-J9Y2X4P6L4 cloud-waypoint-service % waypoint runner uninstall -platform=kubernetes -id=eks-int
❌ Uninstalling runner...
! runner with ID "eks-int" not found in namespace "default" with
  current context

Steps to Reproduce Create a new AWS EKS Cluster. Confirm using the right aws acct with aws sts get-caller-identity --query "Account" --output text and using the right kube context with kubectl config current-context. Try the above to create a runner.

Expected behavior I would expect to be able to create the runner with no issues, but if issues arose, then I would expect the runner uninstall would be able to cleanup the resources it created even if a runner didn't get created since it outputted to do so.

I now have lingering resources to manually cleanup:

$ k get pv
No resources found

$ k get pvc
NAME                                                    STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-default-waypoint-eks-int-runner-0   Pending                                      gp2            42m

$ k get serviceaccount
NAME                  SECRETS   AGE
default               1         16h
waypoint-runner       1         42m
waypoint-runner-odr   1         42m

$ k get statefulset
NAME                                     READY   AGE
waypoint-eks-hcp-int-runner   0/1     45m

NOTE: Even after manually cleaning up all resources and running a waypoint runner forget I still am unable to create and delete the runner due to - however, I was able to create a new runner successfully by appending new to the name:

waypoint runner install -platform=kubernetes \
-server-addr=<server_addr> \
-k8s-runner-image=hashicorp/waypoint:latest \
-id=eks-int -vvv -- -label=selector=eks-int
2023-02-03T10:02:30.139-0600 [INFO]  waypoint: waypoint version: full_string="v0.10.4 (46720cf33)" version=v0.10.4 prerelease="" metadata="" revision=46720cf33
2023-02-03T10:02:30.139-0600 [TRACE] waypoint: starting interrupt listener for context cancellation
2023-02-03T10:02:30.139-0600 [TRACE] waypoint: interrupt listener goroutine started
2023-02-03T10:02:30.141-0600 [DEBUG] waypoint: home configuration directory: path=/Users/cassiecoyle/Library/Preferences/waypoint
2023-02-03T10:02:30.142-0600 [TRACE] waypoint: no API client provided, initializing connection if possible
2023-02-03T10:02:30.142-0600 [INFO]  waypoint.server: attempting to source credentials and connect
2023-02-03T10:02:30.144-0600 [DEBUG] waypoint.serverclient: connection information: address=<server_addr> tls=true tls_skip_verify=false send_auth=true has_token=true
2023-02-03T10:02:30.805-0600 [DEBUG] waypoint.server: connection established with sourced credentials
2023-02-03T10:02:30.805-0600 [TRACE] waypoint: requesting version info from server
2023-02-03T10:02:30.888-0600 [INFO]  waypoint: server version info: version="hcp v0.10.0" api_min=1 api_current=1 entrypoint_min=1 entrypoint_current=1
2023-02-03T10:02:30.889-0600 [INFO]  waypoint: negotiated api version: version=1
✓ Finished connecting to: <server_addr>
❌ Installing runner...
❌ Installing Waypoint Helm chart with runner options: waypoint
  Waypoint runner service account already exists - a new service account will not be created
! Error installing runner: cannot re-use a name that is still in use
Please run the following to clean up the resources from the unsuccessful runner installation,
specifying additional platform flags as needed:

waypoint runner uninstall -platform=kubernetes -id=eks-int <additional_platform_flags>
2023-02-03T10:02:33.644-0600 [TRACE] waypoint: stopping signal listeners and cancelling the context

$ waypoint runner uninstall -platform=kubernetes -id=eks-int
❌ Uninstalling runner...
! runner with ID "eks-int" not found in namespace "default" with
  current context

Waypoint Platform Versions HCP

cicoyle commented 1 year ago

Turns out I had a lingering helm release that was causing the issues:

helm list                                                                      
NAME                                NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
waypoint-eks-int        default     1           2023-02-03 07:48:10.603686 -0600 CST    failed      waypoint-0.1.17 0.10.5     
waypoint-new-eks-int    default     1           2023-02-03 09:07:53.093235 -0600 CST    deployed    waypoint-0.1.17 0.10.5     
cassiecoyle@cassiecoyle-J9Y2X4P6L4 cloud-waypoint-service % helm uninstall waypoint-eks-int
release "waypoint-eks-int" uninstalled

After the helm uninstall I was able to create the runner successfully.

evanphx commented 1 year ago

Seems like uninstall should execute the helm uninstall regardless.

yuriy-yarosh commented 1 year ago

tbh this waypoint situation should've been handled with terraform modules in the first place. I don't get why people are reinventing the wheel of self-competition.

Too many provisioning / cleanup / force destroy bugs for the features that had matured enough already at the terraform level.