Closed chmouel closed 6 years ago
bananas
?
weird that when trying this locally i am getting the right error handling :
This error breaks E2E tests quite often
http://artifacts.ci.centos.org/devtools/e2e/devtools-saas-openshiftio-e2e-smoketest-released/129/05-03-jenkins-direct-log.png http://artifacts.ci.centos.org/devtools/e2e/devtools-saas-openshiftio-e2e-smoketest-beta/127/05-03-jenkins-direct-log.png http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-1b-released/653/05-01-jenkins-log-failed.png
@chmouel why did you decrease severity?
Could this be related?
@ppitonak any chances you can give us a bit more context to this error message?
@chmouel everything looks the same as in too many redirects
case but the actual content of http://jenkins.openshift.io is what you see on screenshot
E2E test does the following:
View log
link doesn't appear in UICould this be related?
@chmouel seems to me like an openshift n/w issue again - 5432 AFAICT is the postgres port.
@ppitonak could you please explain why this is a P1 ?
namespaces missing DC for Jenkins must be a very rare case and is related to E2E tests resetting workspaces. How are you validating if the reset environment has worked?
But that issue only is about when resetting environment not sure if that can be postgresql related,
agree with you @sthaha oc delete all
is only for people who have -edit rights on -jenkins and does that kind of things withouth going by the tenant reset call,
I tried again to confirm and looking at the logs it is indeed the deletion that cause the issue :
% oc delete all --all -n $T-jenkins
replicationcontroller "jenkins-8" deleted
replicationcontroller "jenkins-9" deleted
service "bayesian-link" deleted
service "jenkins" deleted
service "jenkins-jnlp" deleted
deploymentconfig.apps.openshift.io "jenkins" deleted
route.route.openshift.io "jenkins" deleted
access jenkins.openshift.io
{"cluster":"https://api.starter-us-east-2.openshift.com/","component":"proxy","level":"info","msg":"found ns : \"cboudjna2-jenkins\", cluster: \"https://api.starter-us-east-2.openshift.com/\"","ns":"cboudjna2-jenkins","part":"token_json","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
{"cluster":"https://api.starter-us-east-2.openshift.com/","component":"proxy","level":"info","msg":"Fetched OSO token from OSIO token","ns":"cboudjna2-jenkins","part":"token_json","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
{"cluster":"https://api.starter-us-east-2.openshift.com/","component":"proxy","level":"error","msg":"Error when starting Jenkins: 2: openshift client error: got status 404 Not Found (404) from https://api.starter-us-east-2.openshift.com/oapi/v1/namespaces/cboudjna2-jenkins/deploymentconfigs/jenkins","ns":"cboudjna2-jenkins","part":"token_json","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
{"component":"proxy","level":"info","msg":"returned: |key: \"\" |ns: \"cboudjna2-jenkins\" |fwd: false|","request-hash":3682938397,"time":"2018-10-15T08:15:17Z"}
This code [here](https://github.com/fabric8-services/fabric8-jenkins-proxy/blob/47eb3e77936aef48428968d7781a6b8d95a2738a/internal/proxy/ui_requests.go#L66 : ) woudl do this :
// we don't care about code here since only the state of jenkins pod -
// running or not is what is relevant
state, _, err := p.startJenkins(ns, clusterURL)
if err != nil {
nsLogger.Errorf("Error when starting Jenkins: %s", err)
http.Redirect(w, r, redirectURL.String(), http.StatusTemporaryRedirect)
return
}
which in case of a NotFound
would redirect forever, the redirect is done in case of timeout isnt it ? shouldnt we filter those between a 404 and others ?
@sthaha @kishansagathiya
@chmouel good find! Any thoughts in case of 404
what the proxy should do?
return a 404 ? user would know that "jenkins dc cannot be found" ?
@ppitonak We are working on that issue,
But it came to my attention that it has been estimated as critical for running the E2E, from my understanding with the chat we had on mattermost, you don't do a tearUp or tearDown of the environment (i.e: resetting the env), can you please confirm ?
Because unless you do aoc delete --all all
inside the $USER-jenkins tenant without a call to the fabric8-tenant service to recreate them, you would not see this issue, (as per my paste earlier)
That error of REDIRECT can occur in a different scenarios which we should track down (and then work as priority one). But please to help us with the debugging, provide us with the whole :
I am going to set this issue as p3 as it should be, please feel free to convince me otherwise,
@chmouel we reset the environment (i.e. click "Erase My OpenShift.io Environment on https://openshift.io/myuser/_cleanup) after test run
@ppitonak okay, this could be as well a issue with the tenant service,
So can you please let us know if there is objects i.e: oc get all -n $USER-jenkins
before running the start of a test, and run that would be very useful so we can make sure what is your issue, the oc get ev -n $USER-jenkins
and oc logs jenkins/dc -n $USER-jenkins
and the time the test has been started so we can correlate inside the idler logs,
Thanks a lot,
I will provide all data when I see this error next time.
We have a failed job
I am going to set this issue as p3 as it should be, please feel free to convince me otherwise,
Setting the priority back to P1 because it caused PR check to saas-openshiftio failure, affects whole OSIO team.
EDIT: Removing my previous comment about not being run in jenkins namespace which actually do,
So after chatting, it seems that the tenant service don't recreate properly the jenkins namespace after a reset environmenet has been done, maybe it's the tenant service who has issue or the call by the tests to the UI wasn't done properly.
Either way it's completely different from this issue,
can we please continue on a new issue please? so we don't confuse the fabric8-tenant team where this needs to be assigned,
more details here https://chat.openshift.io/developers/pl/czt3dscodbde8qoabubxqxrxny
Removing P1 from this issue as this should go to a new one,
@ppitonak maybe related to your issue https://github.com/openshiftio/openshift.io/issues/4121#issuecomment-410648540
We have merged https://github.com/fabric8-services/fabric8-jenkins-proxy/pull/334#issuecomment-432577825 and now errrors should be more explicit i.e: for our issue when there is no dc inside jenkins namespace we are now showing a 500 :
@ppitonak this may affects your tests (in a good way), you are not going to have a redirect loop but an error message, let us know how it goes,
Getting
TOO_MANY_REDIRECTS
when there is no dc in -jenkins namespace (manual reset for example)