redhat-developer / odo

odo - Developer-focused CLI for fast & iterative container-based application development on Podman and Kubernetes. Implementation of the open Devfile standard.
https://odo.dev
Apache License 2.0
792 stars 244 forks source link

odo push with --source and --config is failing more often on multi stage infrastructure #3547

Closed prietyc123 closed 4 years ago

prietyc123 commented 4 years ago

/kind bug

What versions of software are you using?

Operating System: All supported

Output of odo version: master

How did you run odo exactly?

Running tests on openshift release repo pr https://github.com/openshift/release/pull/9431

Actual behavior

[...]
Running oc with args [oc get dc nodejs-nodejs-push-context-test --namespace gcxdajfudq]
[oc] I0712 16:14:14.434158   10620 request.go:621] Throttling request took 1.026026133s, request: GET:https://api.ci-op-8lkn26zg-3c947.origin-ci-int-aws.dev.rhcloud.com:6443/apis/scheduling.k8s.io/v1beta1?timeout=32s
[oc] NAME                              REVISION   DESIRED   CURRENT   TRIGGERED BY
[oc] nodejs-nodejs-push-context-test   1          1         0         config,image(nodejs:latest)
Running odo with args [odo push --source --context /tmp/909511552]
[odo] I0712 16:14:23.368848   10650 preference.go:182] The path for preference file is /tmp/909511552/config.yaml
[odo] I0712 16:14:23.464672   10650 common_push.go:164] SourceLocation: ./
[odo] I0712 16:14:23.464703   10650 common_push.go:172] Source Path: /tmp/909511552
[odo] I0712 16:14:23.494778   10650 preference.go:182] The path for preference file is /tmp/909511552/config.yaml
[odo] Validation
[...]
[odo] run: stopped
[odo] + /opt/odo/bin/supervisord ctl start run
[odo] run: started
[odo] 
 ✓  Building component [8s]
[odo]  ✓  Changes successfully pushed to component
[odo] I0712 16:15:05.095681   10650 odo.go:72] Could not get the latest release information in time. Never mind, exiting gracefully :)
Running oc with args [oc get dc nodejs-nodejs-push-context-test --namespace gcxdajfudq]
[oc] I0712 16:15:06.584766   10677 request.go:621] Throttling request took 1.045373096s, request: GET:https://api.ci-op-8lkn26zg-3c947.origin-ci-int-aws.dev.rhcloud.com:6443/apis/storage.k8s.io/v1beta1?timeout=32s
[oc] NAME                              REVISION   DESIRED   CURRENT   TRIGGERED BY
[oc] nodejs-nodejs-push-context-test   1          1         1         config,image(nodejs:latest)
Running oc with args [oc get pods --namespace gcxdajfudq --selector=deploymentconfig=nodejs-nodejs-push-context-test -o jsonpath='{.items[0].metadata.name}']
[oc] I0712 16:15:16.960820   10711 request.go:621] Throttling request took 1.036329316s, request: GET:https://api.ci-op-8lkn26zg-3c947.origin-ci-int-aws.dev.rhcloud.com:6443/apis/template.openshift.io/v1?timeout=32s
[oc] 'nodejs-nodejs-push-context-test-1-qst7f'Running oc with args [oc exec nodejs-nodejs-push-context-test-1-qst7f --namespace gcxdajfudq -c nodejs-nodejs-push-context-test -- sh -c ls -la $ODO_S2I_DEPLOYMENT_DIR/package.json]
[oc] I0712 16:15:27.322420   10779 request.go:621] Throttling request took 1.138243276s, request: GET:https://api.ci-op-8lkn26zg-3c947.origin-ci-int-aws.dev.rhcloud.com:6443/apis/apps.openshift.io/v1?timeout=32s
[oc] -rw-rw-r--. 1 1000650000 root 326 Jul 12 16:15 /opt/app-root/src/package.json
Deleting project: gcxdajfudq
Running odo with args [odo project delete gcxdajfudq -f]
[odo] I0712 16:15:36.502969   10841 application.go:49] Unable to list Service Catalog instances: unable to list ServiceInstances: serviceinstances.servicecatalog.k8s.io is forbidden: User "developer" cannot list resource "serviceinstances" in API group "servicecatalog.k8s.io" in the namespace "gcxdajfudq"
[odo] This project contains the following applications, which will be deleted
[odo] Application nodejs-push-context-test
[odo] I0712 16:15:36.592759   10841 occlient.go:2795] Getting DeploymentConfig: nodejs-nodejs-push-context-test
[odo] I0712 16:15:36.627438   10841 component.go:987] Source for component nodejs is  (local)
[odo] I0712 16:15:36.627475   10841 url.go:379] Listing routes with label selector: app.kubernetes.io/part-of=nodejs-push-context-test,app.kubernetes.io/instance=nodejs
[odo] I0712 16:15:36.627481   10841 occlient.go:2654] Listing routes with label selector: app.kubernetes.io/part-of=nodejs-push-context-test,app.kubernetes.io/instance=nodejs
[odo] I0712 16:15:36.717109   10841 occlient.go:2795] Getting DeploymentConfig: nodejs-nodejs-push-context-test
[odo] This application has following components that will be deleted
[odo] I0712 16:15:36.807765   10841 occlient.go:2795] Getting DeploymentConfig: nodejs-nodejs-push-context-test
[odo] I0712 16:15:36.834998   10841 component.go:987] Source for component nodejs is  (local)
[odo] I0712 16:15:36.835026   10841 url.go:379] Listing routes with label selector: app.kubernetes.io/part-of=nodejs-push-context-test,app.kubernetes.io/instance=nodejs
[odo] I0712 16:15:36.835032   10841 occlient.go:2654] Listing routes with label selector: app.kubernetes.io/part-of=nodejs-push-context-test,app.kubernetes.io/instance=nodejs
[odo] I0712 16:15:36.928055   10841 occlient.go:2795] Getting DeploymentConfig: nodejs-nodejs-push-context-test
[odo] component named nodejs
[odo] No services / could not get services
[odo] I0712 16:15:37.009458   10841 project.go:136] unable to list services: unable to list ServiceInstances: serviceinstances.servicecatalog.k8s.io is forbidden: User "developer" cannot list resource "serviceinstances" in API group "servicecatalog.k8s.io" in the namespace "gcxdajfudq"
[odo]  ⚠  Warning! Projects are deleted from the cluster asynchronously. Odo does its best to delete the project. Due to multi-tenant clusters, the project may still exist on a different node.
[odo] I0712 16:15:37.047991   10841 odo.go:72] Could not get the latest release information in time. Never mind, exiting gracefully :)
[odo]  ✓  Deleted project : gcxdajfudq
Setting current dir to: /go/src/github.com/openshift/odo/tests/integration
Deleting dir: /tmp/909511552
• Failure [86.893 seconds]
odo component command tests
/go/src/github.com/openshift/odo/tests/integration/cmd_cmp_test.go:13
  Test odo push with --source and --config flags
  /go/src/github.com/openshift/odo/tests/integration/component.go:342
    when --context is used
    /go/src/github.com/openshift/odo/tests/integration/component.go:407
      create local nodejs component and push source and code separately [It]
      /go/src/github.com/openshift/odo/tests/integration/component.go:409
      Expected

          <bool>: false
      to equal
          <bool>: true
      /go/src/github.com/openshift/odo/tests/integration/component.go:436

Expected behavior

It should get pushed.

Any logs, error output, etc?

More info https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/9431/rehearse-9431-periodic-ci-openshift-odo-master-v4.5-integration-e2e-periodic-steps/1282336940108025856#1:build-log.txt%3A646

kadel commented 4 years ago

How is this flake when it is failing CONSISTENTLY? :-)

kadel commented 4 years ago

This looks like more an issue with tests or infrastructure than odo /remove-kind bug /kind failing-test

prietyc123 commented 4 years ago

How is this flake when it is failing CONSISTENTLY? :-)

I have observed it failing consistently on 4.5 cluster but talking about other cluster sometimes it gets passed sometimes failed. But mostly passing on which you can see in pr https://github.com/openshift/release/pull/9431.

And this is the reason I am considering it flake though not sure what is happening around. Anyway I will update the issue title

prietyc123 commented 4 years ago

I observed till now that this is purely specific to 4.5 cluster. None of the time I have seen it passing on 4.5 and also did not get this failure on any of other cluster. Failure logs can been seen https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/pr-history/?org=openshift&repo=release&pr=9431

prietyc123 commented 4 years ago

On 4.5 cluster(multistage) https://github.com/openshift/odo/blob/master/tests/integration/component.go#L506-L508 this closer function returns false all the time because it hits some error which is not equal to nil but the calling function CheckCmdOpInRemoteCmpPod

https://github.com/openshift/odo/blob/aa05fb4799d626897259931b75baef4af804293e/tests/helper/helper_oc.go#L144-L149

doesn't pass any error value to the closer function. The closer function doesn't take the argument nil explicitly passed by the above function even after having one more checks to verify the error before sending nil to it.

prietyc123 commented 4 years ago
exec output for s2i image package -rw-rw-r--. 1 1000630000 root 326 Jul 17 08:57 /opt/app-root/src/package.json 
exec error for s2i image package I0717 08:57:47.177248 10879 request.go:621] Throttling request took
1.095986377s, request: GET:https://api.ci-op-0lpgbttl-712e6.origin-ci-int-aws.dev.rhcloud.com:6443/apis/apiextensions.k8s.io/v1?timeout=32s 

The test code reporting both stdout and stderr. In stderr It has been throttling out may be due to the excessive api call specific on 4.5 multistage test infra as per one of the testplatform communication. According the slack message poster, he just reduce the excessive API requests to make it work. I have no idea how the api call are handled in odo. Anyway i will enable the verbose to get more logging information. Also I found one article https://access.redhat.com/solutions/3664861 related to storage having same issue we are facing with the resolution provided but I don’t find it helpful for our scenario.

prietyc123 commented 4 years ago

Enabling the verbosity log could not help much. I believe I have given all the information regarding the failure. I need developer eyes to take it further with the info I have already provided.

Follow the steps To debug the failure:

  1. First You need to dump your changes specific to this failure into odo repo
  2. Go to https://github.com/openshift/release/pull/9431 and apply label /retest to check your changes on multistage test infra.

Ping @kadel @girishramnani

prietyc123 commented 4 years ago

Got to know form a kubernetes article that less memory may causes throttling error. Checking with more requested memory.

prietyc123 commented 4 years ago

I also tried with increasing the resources https://github.com/openshift/release/pull/9431/commits/397964cbb6680964acc5c148c36c2deadf505d4d but it does not help PR History. Overall we need follow https://github.com/openshift/odo/issues/3547#issuecomment-660114118 to proceed further on this. For which I need developer to look into.