microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.49k stars 2.61k forks source link

kubectl rollout status hangs for a few minutes when using a linux agent #13737

Closed jedimanu-ux closed 4 years ago

jedimanu-ux commented 4 years ago

Required Information

Question, Bug, or Feature?
Type: Bug

Enter Task Name: kubectl V1

Environment

Issue Description

I have a simple release process with two kubectl tasks which run on a on-premise kubernetes cluster:

  1. kubectl:
    command: set arguments: image deployment/$(k8s_deploymentName) -n $(namespace) $(k8s_deploymentName)=$(image):$(Build.BuildNumber)
  2. kubectl: command: rollout arguments: status deployment/$(k8s_deploymentName)

Whenever I use a linux agent for the release the 2nd step takes a very long time even though I can see from the logs that the command finished immediately.

Task logs

2020-10-20T08:47:05.1211265Z [command]/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl rollout -n web-apps status deployment/test
2020-10-20T08:47:05.1212827Z ##[debug]Called Toolrunner exec() method asynchronously. Returning the promise.
2020-10-20T08:47:05.6736110Z deployment "test" successfully rolled out
2020-10-20T08:51:07.3391244Z W1020 08:51:07.334790   10261 reflector.go:299] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *unstructured.Unstructured ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received
2020-10-20T08:51:07.3498522Z ##[debug]Exit code 0 received from tool '/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl'
2020-10-20T08:51:07.3513377Z ##[debug]STDIO streams have closed for tool '/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl'
2020-10-20T08:51:07.3701526Z ##[debug]Toolrunner exec() method returned successfully for the kubectl command.
2020-10-20T08:51:07.3703185Z commandOutputdeployment "test" successfully rolled out
2020-10-20T08:51:07.3705935Z 
2020-10-20T08:51:07.3708107Z ##[debug]set KubectlOutput=deployment "test" successfully rolled out
shigupt202 commented 4 years ago

@jedimanu-ux The deployment command generally executes very quickly, but the deployment is still in progress. The rollout status command actually waits for the pods to be up and running deployment to be successful (which takes some time). You can verify this by checking the deployment status from your local machine (given that you have the Kubeconfig locally) when you see the rollout status command in progress. Also, I believe it should be taking almost similar amount of time on other agents as well (as it depends on the time taken by pods to reach running state in your k8s cluster). Can you verify that?

jedimanu-ux commented 4 years ago

Attaching logs from a windows agent and from a linux agent for comparison. If you look closely at the timestamps you'll see that the rollout status command finishes almost intantly every time, but on the linux agent, the task result comes only after a few minutes in the commandOutputdeployment log. On the windows agent the commandOutputdeployment comes immediately and that's why the task also finishes instantly.

Linux agent

2020-10-21T07:56:15.8249109Z [command]/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl rollout -n web-apps status deployment/test
2020-10-21T07:56:15.8250111Z ##[debug]Called Toolrunner exec() method asynchronously. Returning the promise.
2020-10-21T07:56:16.4599234Z deployment "test" successfully rolled out
2020-10-21T08:00:18.4014315Z W1021 08:00:18.397838   23452 reflector.go:299] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *unstructured.Unstructured ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received
2020-10-21T08:00:18.4126000Z ##[debug]Exit code 0 received from tool '/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl'
2020-10-21T08:00:18.4146152Z ##[debug]STDIO streams have closed for tool '/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl'
2020-10-21T08:00:18.4168376Z ##[debug]Toolrunner exec() method returned successfully for the kubectl command.
2020-10-21T08:00:18.4180494Z commandOutputdeployment "test" successfully rolled out

Windows agent

2020-10-21T08:02:02.1875670Z [command]d:\w4\_tool\kubectl\1.16.2\x64\kubectl.exe rollout -n web-apps status deployment/test
2020-10-21T08:02:03.5848983Z ##[debug]Called Toolrunner exec() method asynchronously. Returning the promise.
2020-10-21T08:02:04.9452909Z deployment "test" successfully rolled out
2020-10-21T08:02:05.1385554Z ##[debug]Exit code 0 received from tool 'd:\w4\_tool\kubectl\1.16.2\x64\kubectl.exe'
2020-10-21T08:02:05.1402210Z ##[debug]STDIO streams have closed for tool 'd:\w4\_tool\kubectl\1.16.2\x64\kubectl.exe'
2020-10-21T08:02:05.1420501Z ##[debug]Toolrunner exec() method returned successfully for the kubectl command.
2020-10-21T08:02:05.1433055Z commandOutputdeployment "test" successfully rolled out
shigupt202 commented 4 years ago

Did you delete the deployment before trying on Windows agent? If the deployment was already present on the cluster, it would take no time to be up and running again.

jedimanu-ux commented 4 years ago

No I did not. Deployment was not changed previous also, same image was used every time. Attaching again logs for Linux agent AFTER it previously ran on Windows for the sake of being consistent

2020-10-21T08:25:05.1445044Z [command]/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl rollout -n web-apps status deployment/test
2020-10-21T08:25:05.1446072Z ##[debug]Called Toolrunner exec() method asynchronously. Returning the promise.
2020-10-21T08:25:05.7819674Z deployment "test" successfully rolled out
2020-10-21T08:29:07.6714368Z ##[debug]Exit code 0 received from tool '/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl'
2020-10-21T08:29:07.6732718Z ##[debug]STDIO streams have closed for tool '/var/vsts/azpipelinesagentrelease1/_tool/kubectl/1.16.2/x64/kubectl'
2020-10-21T08:29:07.6758217Z ##[debug]Toolrunner exec() method returned successfully for the kubectl command.
2020-10-21T08:29:07.6767505Z commandOutputdeployment "test" successfully rolled out
shigupt202 commented 4 years ago

This looks like a Kubernetes/Kubectl (maybe watch) related issue to me. Please take a look at the message shown in the logs after rollout command. This is probably why it's taking that extra time. W1020 08:51:07.334790 10261 reflector.go:299] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *unstructured.Unstructured ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received

Ref: https://github.com/kubernetes/client-go/blob/master/tools/watch/informerwatcher.go

jedimanu-ux commented 4 years ago

As you can see from the last log I pasted this also happens when that warning is not thrown.

shigupt202 commented 4 years ago

Kubectl task does not do any special handling for rollout status command, and from the warning it looks like a kubectl issue. If you still think that it's a task issue, you can try to repro it by running the same kubectl command on cmdline task on Linux agent and let us know the result. @ds-ms Thoughts?

thesattiraju commented 4 years ago

I agree with your assessment. @shigupt202

Closing this issue, @jedimanu-ux if you still feel this needs to be addressed feel free to reopen this issue.

SzilvasiPeter commented 1 year ago

I am using Azure pipeline with a Windows 10 self-hosted Agent. After configuring the service connection and creating the manifest files, the Kubernetes deployment task freezes. Here is the snippet of the task definition:

-- snip --
  - task: KubernetesManifest@1
    displayName: Deploy Image to AKS
    inputs:
      action: 'deploy'
      connectionType: 'kubernetesServiceConnection'
      kubernetesServiceConnection: $(kubernetesSVC)
      manifests: 'manifests/deployment.yml'
      containers: $(imageRepository)
-- snip --

The task hangs at C:\ProgramData\chocolatey\bin\kubectl.exe rollout status <my-deployment> --timeout 0s --namespace default command execution. I got a timeout error after 10 minutes and 3 seconds, the output of the command is the following:

Waiting for deployment "<my-deployment>" rollout to finish: 0 of 1 updated replicas are available...
error: deployment "<my-deployment>" exceeded its progress deadline

After that, the pipeline executes the kubectl's describe and get service/<my-service> commands to retrieve some information. The deployment task finished with the "Rollout status check failed" error message.

The weird thing is that I can see that the deployment is created on the Azure portal but the deployment task is failing constantly and consistently after rerunning the failed job.

EDIT:

I have realized that the problem is not caused by the Azure pipeline task rather it was the incorrect azure-pipeline.yml file and the incorrect manifests (deployment.yml and service.yml) files. The pipeline was fixed after adding the following steps:

Note that the kubectl rollout command will fail if the Azure pipeline or the manifest configuration files are incorrect.