Closed sarveshtamba closed 10 months ago
@sarveshtamba Have you tried the same scenario manually on such platform ?
No, however this works fine on another OCP 4.5 cluster. Want to confirm if this is a flaky scenario or if this is observed on any other platforms?
No, however this works fine on another OCP 4.5 cluster.
I did not get it. Can you please explain what do you mean by another OCP 4.5 cluster ? i guess it works against 4.5 cluster but fails on rest cluster version
Want to confirm if this is a flaky scenario or if this is observed on any other platforms?
I have never seen it on amd64 images i mean in our CI environment and locally too.
No, however this works fine on another OCP 4.5 cluster.
I did not get it. Can you please explain what do you mean by another OCP 4.5 cluster ? i guess it works against 4.5 cluster but fails on rest cluster version
I have tried running the same test suites on 2 different OCP 4.5 clusters. In one cluster I don't see these EOF errors, while in the other I see these EOF errors, hence want to confirm if this is a flaky issue or is it local setup issue?
Want to confirm if this is a flaky scenario or if this is observed on any other platforms?
I have never seen it on amd64 images i mean in our CI environment and locally too.
This appears to be a OCP cluster specific issue, not encountering this on another OCP cluster. Closing this for now, will re-open if required.
Hitting this issue on another cluster too while running odo test suites. Reopening this one.
cc:- @amitkrout @scottkurz ^^
Checked your logs. The important information is this:
[odo] I0906 18:06:32.090832 203070 occlient.go:1843] Warning Event: Count: 10, Reason: BackOff, Message: Back-off restarting failed container
The container was killed by cluster because it did not started in 3mins, this is the reason for "Unexpected EOF".
I can't see why it didn't started but my guess would be that cluster couldn't download the image or it took too long to download the image (registry.redhat.io/rhscl/nodejs-12-rhel7@sha256:f02a15704dad16bfe1a478ddd02fa425201a9873e53aa84498b41cae3412ecff
)
@kadel seeing this on OCP 4.6 cluster as well. Any suggestions? Is there anyone who can help here?
One way to get more info about this would be to verify if it is odo or cluster. For example, you could try to create 10 Pods that are using an image from my previous commend, and check if some of them fail
@kadel Tried manually creating the steps used in one of the failing tests, below are the oc describe
and oc logs
results:
[root@rhodopowerci-inf 603110024]# mkdir /tmp/825666372
[root@rhodopowerci-inf 603110024]# cd /tmp/825666372
[root@rhodopowerci-inf 825666372]# odo project create iyisbvflno -w -v4
I1123 01:39:58.274073 2714 util.go:730] HTTPGetRequest: https://raw.githubusercontent.com/openshift/odo/master/build/VERSION
I1123 01:39:58.274320 2714 util.go:751] Response will be cached in /tmp/odohttpcache for 1h0m0s
• Waiting for project to come up ...
I1123 01:39:58.601781 2714 util.go:764] Cached response used.
I1123 01:39:59.348170 2714 occlient.go:531] Status of creation of project iyisbvflno is Active
I1123 01:39:59.348214 2714 occlient.go:536] Project iyisbvflno now exists
I1123 01:39:59.353013 2714 namespace.go:181] Status of creation of service account &ServiceAccount{ObjectMeta:{default iyisbvflno /api/v1/namespaces/iyisbvflno/serviceaccounts/default d4f183e4-2828-4eca-aa66-c3b4e3a6e1d2 31773856 0 2020-11-23 01:39:59 -0800 PST <nil> <nil> map[] map[] [] [] []},Secrets:[]ObjectReference{ObjectReference{Kind:,Namespace:,Name:default-dockercfg-ms6df,UID:,APIVersion:,ResourceVersion:,FieldPath:,},ObjectReference{Kind:,Namespace:,Name:default-token-xsgbk,UID:,APIVersion:,ResourceVersion:,FieldPath:,},},ImagePullSecrets:[]LocalObjectReference{LocalObjectReference{Name:default-dockercfg-ms6df,},},AutomountServiceAccountToken:nil,} is ready
✓ Waiting for project to come up [1s]
✓ Project 'iyisbvflno' is ready for use
✓ New project created and now using project: iyisbvflno
---
A newer version of odo (v2.0.1) is available,
visit https://github.com/openshift/odo/releases to update.
If you wish to disable this notification, run:
odo preference set UpdateNotification false
---
[root@rhodopowerci-inf 825666372]# odo component create --s2i nodejs mynodejs --project iyisbvflno --context /tmp/825666372 --app app --s2i
Validation
✓ Validating component [17ms]
Please use `odo push` command to create the component with source deployed
---
A newer version of odo (v2.0.1) is available,
visit https://github.com/openshift/odo/releases to update.
If you wish to disable this notification, run:
odo preference set UpdateNotification false
---
[root@rhodopowerci-inf 825666372]# odo url create url1 --port 8080 --context /tmp/825666372
✓ URL url1 created for component: mynodejs
To apply the URL configuration changes, please use `odo push`
---
A newer version of odo (v2.0.1) is available,
visit https://github.com/openshift/odo/releases to update.
If you wish to disable this notification, run:
odo preference set UpdateNotification false
---
[root@rhodopowerci-inf 825666372]# odo storage create storage1 --path /data1 --size 1Gi --context /tmp/825666372
✓ Added storage storage1 to mynodejs
Please use `odo push` command to make the storage accessible to the component
---
A newer version of odo (v2.0.1) is available,
visit https://github.com/openshift/odo/releases to update.
If you wish to disable this notification, run:
odo preference set UpdateNotification false
---
[root@rhodopowerci-inf 825666372]# odo push --context /tmp/825666372
Validation
✓ Checking component [42ms]
Configuration changes
✓ Added storage storage1 to mynodejs
✓ Initializing component
✓ Creating component [232ms]
Applying URL changes
✓ URL url1: http://url1-app-iyisbvflno.apps.rhodopowerci.cp.fyre.ibm.com/ created
Pushing to component mynodejs of type local
✓ Checking files for pushing [616471ns]
⚠ Watch channel was closed start [⚠ WARNING x9: BackOff]
✗ Waiting for component to start [3m] [WARNING x9: BackOff]
✗ watch channel was closed
[root@rhodopowerci-inf 825666372]# oc get all
NAME READY STATUS RESTARTS AGE
pod/mynodejs-app-1-deploy 0/1 Error 0 14m
NAME DESIRED CURRENT READY AGE
replicationcontroller/mynodejs-app-1 0 0 0 14m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/mynodejs-app ClusterIP 172.30.107.47 <none> 8080/TCP 14m
NAME REVISION DESIRED CURRENT TRIGGERED BY
deploymentconfig.apps.openshift.io/mynodejs-app 1 1 0 config,image(nodejs:latest)
NAME IMAGE REPOSITORY TAGS UPDATED
imagestream.image.openshift.io/mynodejs-app default-route-openshift-image-registry.apps.rhodopowerci.cp.fyre.ibm.com/iyisbvflno/mynodejs-app
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/url1-app url1-app-iyisbvflno.apps.rhodopowerci.cp.fyre.ibm.com / mynodejs-app 8080 None
[root@rhodopowerci-inf 825666372]# oc describe pod/mynodejs-app-1-deploy
Name: mynodejs-app-1-deploy
Namespace: iyisbvflno
Priority: 0
Node: worker0.rhodopowerci.cp.fyre.ibm.com/10.17.73.98
Start Time: Mon, 23 Nov 2020 01:41:04 -0800
Labels: openshift.io/deployer-pod-for.name=mynodejs-app-1
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.254.12.15"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.254.12.15"
],
"default": true,
"dns": {}
}]
openshift.io/deployment-config.name: mynodejs-app
openshift.io/deployment.name: mynodejs-app-1
openshift.io/scc: restricted
Status: Failed
IP: 10.254.12.15
IPs:
IP: 10.254.12.15
Containers:
deployment:
Container ID: cri-o://fabbb5d524158ed68bd9a977169ba5eb5b2bd5f789d371839dde88aaf041f3af
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a80a1fcfc1b6bdfb224716fb0bfabba78460481bd08a6161aef308993c9e6314
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a80a1fcfc1b6bdfb224716fb0bfabba78460481bd08a6161aef308993c9e6314
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 23 Nov 2020 01:41:07 -0800
Finished: Mon, 23 Nov 2020 01:51:09 -0800
Ready: False
Restart Count: 0
Environment:
OPENSHIFT_DEPLOYMENT_NAME: mynodejs-app-1
OPENSHIFT_DEPLOYMENT_NAMESPACE: iyisbvflno
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from deployer-token-rcx2w (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
deployer-token-rcx2w:
Type: Secret (a volume populated by a Secret)
SecretName: deployer-token-rcx2w
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned iyisbvflno/mynodejs-app-1-deploy to worker0.rhodopowerci.cp.fyre.ibm.com
Normal AddedInterface 14m multus Add eth0 [10.254.12.15/22]
Normal Pulled 14m kubelet, worker0.rhodopowerci.cp.fyre.ibm.com Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a80a1fcfc1b6bdfb224716fb0bfabba78460481bd08a6161aef308993c9e6314" already present on machine
Normal Created 14m kubelet, worker0.rhodopowerci.cp.fyre.ibm.com Created container deployment
Normal Started 14m kubelet, worker0.rhodopowerci.cp.fyre.ibm.com Started container deployment
[root@rhodopowerci-inf 825666372]#
[root@rhodopowerci-inf 825666372]# oc logs pod/mynodejs-app-1-deploy
--> Scaling mynodejs-app-1 to 1
error: update acceptor rejected mynodejs-app-1: pods for rc 'iyisbvflno/mynodejs-app-1' took longer than 600 seconds to become available
Hmm, I don't see any indicators of what could cause the failure :-(
Can you please try one more time, but this time run include -v 4
argument in odo push
command?
Also while the odo push
is running, can you run odo describe pod
?
@kadel oc describe
shows similar output as before. odo push -v 4
is as below:
[root@rhodopowerci-inf 825666372]# odo project set iyisbvflno
Already on project : iyisbvflno
[root@rhodopowerci-inf 825666372]# odo push --context /tmp/825666372 -v 4
I1123 02:38:52.087808 2279 util.go:425] path /tmp/825666372/devfile.yaml doesn't exist, skipping it
I1123 02:38:52.087813 2279 util.go:730] HTTPGetRequest: https://raw.githubusercontent.com/openshift/odo/master/build/VERSION
I1123 02:38:52.088096 2279 util.go:751] Response will be cached in /tmp/odohttpcache for 1h0m0s
I1123 02:38:52.117136 2279 common_push.go:165] SourceLocation: ./
I1123 02:38:52.117167 2279 common_push.go:173] Source Path: /tmp/825666372
I1123 02:38:52.139143 2279 util.go:425] path /tmp/825666372/devfile.yaml doesn't exist, skipping it
Validation
• Checking component ...
I1123 02:38:52.139262 2279 occlient.go:2781] Getting DeploymentConfig: mynodejs-app
I1123 02:38:52.215887 2279 component.go:532] Checking source location: ./
I1123 02:38:52.215933 2279 component.go:561] Validating configured memory values
I1123 02:38:52.215949 2279 component.go:561] Validating configured cpu values
✓ Checking component [76ms]
I1123 02:38:52.216016 2279 util.go:425] path /tmp/825666372/devfile.yaml doesn't exist, skipping it
Configuration changes
• Retrieving component data ...
I1123 02:38:52.260697 2279 occlient.go:2781] Getting DeploymentConfig: mynodejs-app
I1123 02:38:52.285906 2279 pushed_component.go:198] Source for component mynodejs is (local)
I1123 02:38:52.285958 2279 occlient.go:2781] Getting DeploymentConfig: mynodejs-app
I1123 02:38:52.312500 2279 component.go:1226] Updating component mynodejs, from local to ./ (local).
✓ Retrieving component data [96ms]
• Applying configuration ...
I1123 02:38:52.339356 2279 occlient.go:764] Found exact image tag match for nodejs:latest
I1123 02:38:52.398184 2279 util.go:764] Cached response used.
✓ Applying configuration [121ms]
Applying URL changes
I1123 02:38:52.440321 2279 url.go:323] Listing routes with label selector: app.kubernetes.io/part-of=app,app.kubernetes.io/instance=mynodejs
I1123 02:38:52.440343 2279 occlient.go:2640] Listing routes with label selector: app.kubernetes.io/part-of=app,app.kubernetes.io/instance=mynodejs
✓ URLs are synced with the cluster, no changes are required.
Pushing to component mynodejs of type local
I1123 02:38:52.456734 2279 file_indexer.go:214] file added: /tmp/825666372/.gitignore
I1123 02:38:52.456819 2279 file_indexer.go:202] .odo or .git directory detected, skipping it
✓ Checking file changes for pushing [339731ns]
I1123 02:38:52.456904 2279 common_push.go:245] List of files to be deleted: +[]
I1123 02:38:52.456930 2279 common_push.go:268] Copying directory /tmp/825666372 to pod
I1123 02:38:52.456950 2279 component.go:650] PushLocal: componentName: mynodejs, applicationName: app, path: /tmp/825666372, files: [/tmp/825666372/.gitignore], delFiles: [], isForcePush: false
I1123 02:38:52.468862 2279 preference.go:209] The path for preference file is /root/.odo/preference.yaml
I1123 02:38:52.468912 2279 occlient.go:1871] Waiting for deploymentconfig=mynodejs-app pod
• Waiting for component to start ...
I1123 02:38:52.484990 2279 occlient.go:1842] Warning Event: Count: 22, Reason: BackOff, Message: Back-off restarting failed container
I1123 02:39:52.488570 2279 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
⚠ Watch channel was closed
I1123 02:39:52.488724 2279 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
✗ Waiting for component to start [1m] [WARNING x22: BackOff]
✗ watch channel was closed
@kadel ^^
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
/lifecycle frozen
@sarveshtamba Do you still see the same errors with odo v3?
A friendly reminder that this issue had no activity for 90 days. Stale issues will be closed after an additional 30 days of inactivity.
This issue was closed because it has been inactive for 30 days since being marked as stale.
Facing the following issue on ppc64le while running the odo test suites:-
Attached is the error log - EOF_watchstream_ppc64le.txt