openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Pods with "unknown" state appearing on cluster 2a #4728

Open ljelinkova opened 5 years ago

ljelinkova commented 5 years ago

I've noticed that 3 of our accounts on cluster 2a contain jenkins pods with "unknown" state. They are there for more days and remain there even after reset of environment.

pod/booster-mission-runtime-s2i-8-build   0/1       Completed     0          4h
pod/booster-mission-runtime-s2i-9-build   0/1       Completed     0          1h
pod/jenkins-1-84rbx                       0/1       Pending       0          45m
pod/jenkins-1-deploy                      1/1       Running       0          49m
pod/jenkins-1-hkf75                       0/1       Unknown       0          5d
pod/jenkins-1-lfl6c                       0/1       Terminating   0          1h
pod/jenkins-1-tt7xm                       0/1       Unknown       0          1d
pod/jenkins-1-wzq8n                       0/1       Unknown       0          1d

Affected accounts: osio-ci-e2e-001-preview, osio-ci-e2e-002-preview, osio-ci-e2e-007

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-prod-preview.openshift.io-smoketest-pr-us-east-2a-released/5415/oc-jenkins-logs-before-all.txt http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-prod-preview.openshift.io-smoketest-pr-us-east-2a-beta/5417/oc-jenkins-logs-before-all.txt http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/1467/oc-jenkins-logs-before-all.txt

skryzhny commented 5 years ago

Does it block you?

ljelinkova commented 5 years ago

Yes.

skryzhny commented 5 years ago

OPS cleared pods, I also can't see them. @ljelinkova can you recheck?

ljelinkova commented 5 years ago

The pods have been deleted, however, the new pod is stuck in the terminating state....

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/1474/oc-jenkins-logs-before-all.txt

pbergene commented 5 years ago

@JohnStrunk has an understanding of what might be causing this. It seems related to stuck mounts which can either be unmounted manually or go away as the node is rebooted. As by ways of a fix, this would first have to be created and then we'll have to work out the process to get it applied.

ljelinkova commented 5 years ago

I haven't seen new terminating pods in the last week so I will decrease the severity and priority of the issue. However, I'll leave it open since this should be investigated and prevented in the future.