openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Idler is going crazy when there is no dc/jenkins in -jenkins namespace #4180

Closed chmouel closed 6 years ago

chmouel commented 6 years ago

Getting TOO_MANY_REDIRECTS when there is no dc in -jenkins namespace (manual reset for example) image

hrishin commented 6 years ago

bananas ?

chmouel commented 6 years ago

maybe related https://github.com/fabric8-services/fabric8-jenkins-proxy/commit/99399c90d426140e31f442ab1c34eb77cc58471c

chmouel commented 6 years ago

weird that when trying this locally i am getting the right error handling :

image

ppitonak commented 6 years ago

This error breaks E2E tests quite often

http://artifacts.ci.centos.org/devtools/e2e/devtools-saas-openshiftio-e2e-smoketest-released/129/05-03-jenkins-direct-log.png http://artifacts.ci.centos.org/devtools/e2e/devtools-saas-openshiftio-e2e-smoketest-beta/127/05-03-jenkins-direct-log.png http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-1b-released/653/05-01-jenkins-log-failed.png

@chmouel why did you decrease severity?

ppitonak commented 6 years ago

Could this be related? image

chmouel commented 6 years ago

@ppitonak any chances you can give us a bit more context to this error message?

ppitonak commented 6 years ago

@chmouel everything looks the same as in too many redirects case but the actual content of http://jenkins.openshift.io is what you see on screenshot

E2E test does the following:

  1. create a space
  2. create a new Spring Boot Http app from booster
  3. switch to codebases page
  4. start new Che workspace
  5. switch to pipelines page
  6. wait for build to start - View log link doesn't appear in UI
  7. navigate to jenkins.openshift.io - see the screenshot above
sthaha commented 6 years ago

Could this be related? image

@chmouel seems to me like an openshift n/w issue again - 5432 AFAICT is the postgres port.

sthaha commented 6 years ago

@ppitonak could you please explain why this is a P1 ?

namespaces missing DC for Jenkins must be a very rare case and is related to E2E tests resetting workspaces. How are you validating if the reset environment has worked?

chmouel commented 6 years ago

But that issue only is about when resetting environment not sure if that can be postgresql related,

agree with you @sthaha oc delete all is only for people who have -edit rights on -jenkins and does that kind of things withouth going by the tenant reset call,

chmouel commented 6 years ago

I tried again to confirm and looking at the logs it is indeed the deletion that cause the issue :

% oc delete all --all -n $T-jenkins
replicationcontroller "jenkins-8" deleted
replicationcontroller "jenkins-9" deleted
service "bayesian-link" deleted
service "jenkins" deleted
service "jenkins-jnlp" deleted
deploymentconfig.apps.openshift.io "jenkins" deleted
route.route.openshift.io "jenkins" deleted

access jenkins.openshift.io

{"cluster":"https://api.starter-us-east-2.openshift.com/","component":"proxy","level":"info","msg":"found ns : \"cboudjna2-jenkins\", cluster: \"https://api.starter-us-east-2.openshift.com/\"","ns":"cboudjna2-jenkins","part":"token_json","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
{"cluster":"https://api.starter-us-east-2.openshift.com/","component":"proxy","level":"info","msg":"Fetched OSO token from OSIO token","ns":"cboudjna2-jenkins","part":"token_json","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
{"cluster":"https://api.starter-us-east-2.openshift.com/","component":"proxy","level":"error","msg":"Error when starting Jenkins: 2: openshift client error: got status 404 Not Found (404) from https://api.starter-us-east-2.openshift.com/oapi/v1/namespaces/cboudjna2-jenkins/deploymentconfigs/jenkins","ns":"cboudjna2-jenkins","part":"token_json","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
{"component":"proxy","level":"info","msg":"returned: |key: \"\" |ns: \"cboudjna2-jenkins\" |fwd: false|","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
chmouel commented 6 years ago

This code [here](https://github.com/fabric8-services/fabric8-jenkins-proxy/blob/47eb3e77936aef48428968d7781a6b8d95a2738a/internal/proxy/ui_requests.go#L66 : ) woudl do this :

        // we don't care about code here since only the state of jenkins pod -
        // running or not is what is relevant
        state, _, err := p.startJenkins(ns, clusterURL)
        if err != nil {
            nsLogger.Errorf("Error when starting Jenkins: %s", err)
            http.Redirect(w, r, redirectURL.String(), http.StatusTemporaryRedirect)
            return
        }

which in case of a NotFound would redirect forever, the redirect is done in case of timeout isnt it ? shouldnt we filter those between a 404 and others ?

@sthaha @kishansagathiya

sthaha commented 6 years ago

@chmouel good find! Any thoughts in case of 404 what the proxy should do?

chmouel commented 6 years ago

return a 404 ? user would know that "jenkins dc cannot be found" ?

chmouel commented 6 years ago

@ppitonak We are working on that issue,

But it came to my attention that it has been estimated as critical for running the E2E, from my understanding with the chat we had on mattermost, you don't do a tearUp or tearDown of the environment (i.e: resetting the env), can you please confirm ?

Because unless you do aoc delete --all all inside the $USER-jenkins tenant without a call to the fabric8-tenant service to recreate them, you would not see this issue, (as per my paste earlier)

That error of REDIRECT can occur in a different scenarios which we should track down (and then work as priority one). But please to help us with the debugging, provide us with the whole :

I am going to set this issue as p3 as it should be, please feel free to convince me otherwise,

ppitonak commented 6 years ago

@chmouel we reset the environment (i.e. click "Erase My OpenShift.io Environment on https://openshift.io/myuser/_cleanup) after test run

chmouel commented 6 years ago

@ppitonak okay, this could be as well a issue with the tenant service,

So can you please let us know if there is objects i.e: oc get all -n $USER-jenkins before running the start of a test, and run that would be very useful so we can make sure what is your issue, the oc get ev -n $USER-jenkins and oc logs jenkins/dc -n $USER-jenkins and the time the test has been started so we can correlate inside the idler logs,

Thanks a lot,

ppitonak commented 6 years ago

I will provide all data when I see this error next time.

ppitonak commented 6 years ago

We have a failed job

ppitonak commented 6 years ago

I am going to set this issue as p3 as it should be, please feel free to convince me otherwise,

Setting the priority back to P1 because it caused PR check to saas-openshiftio failure, affects whole OSIO team.

chmouel commented 6 years ago

EDIT: Removing my previous comment about not being run in jenkins namespace which actually do,

So after chatting, it seems that the tenant service don't recreate properly the jenkins namespace after a reset environmenet has been done, maybe it's the tenant service who has issue or the call by the tests to the UI wasn't done properly.

Either way it's completely different from this issue,

can we please continue on a new issue please? so we don't confuse the fabric8-tenant team where this needs to be assigned,

more details here https://chat.openshift.io/developers/pl/czt3dscodbde8qoabubxqxrxny

chmouel commented 6 years ago

Removing P1 from this issue as this should go to a new one,

chmouel commented 6 years ago

@ppitonak maybe related to your issue https://github.com/openshiftio/openshift.io/issues/4121#issuecomment-410648540

chmouel commented 6 years ago

We have merged https://github.com/fabric8-services/fabric8-jenkins-proxy/pull/334#issuecomment-432577825 and now errrors should be more explicit i.e: for our issue when there is no dc inside jenkins namespace we are now showing a 500 :

image

@ppitonak this may affects your tests (in a good way), you are not going to have a redirect loop but an error message, let us know how it goes,