openshift / geard

geard is no longer maintained - see OpenShift 3 and Kubernetes
Other
405 stars 80 forks source link

Add simple retry logic into contrib/test #231

Closed mfojtik closed 9 years ago

mfojtik commented 10 years ago

[test]

openshift-bot commented 10 years ago

Origin Test Results: FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_geard/156/)

mfojtik commented 10 years ago

@smarterclayton @bparees this should retry the integration test run when they get randomly stuck... This is a workaround, I don't really know why they get stuck so often.... We should probably make the tests more verbose to figure that out.

mfojtik commented 10 years ago

oh well, now it stucks on busybox-http image docker build...

mfojtik commented 10 years ago

@smarterclayton @danmcp are you OK to merge this? This should fix the broken geard_ami build.

mfojtik commented 10 years ago

[test]

openshift-bot commented 10 years ago

Evaluated for origin up to 77ad2a2b639eff7beaf2099a39fb3933354015fe

smarterclayton commented 10 years ago

Shouldn't this be done by Jenkins instead?

danmcp commented 10 years ago

@smarterclayton Meaning retry the whole job? Seems too coarse of a retry.

smarterclayton commented 10 years ago

Have we spent any time investigating these hangs? Is it docker? Filesystem? Our code? Rhel6?

mfojtik commented 10 years ago

@smarterclayton I haven't spend any time investigating this, I know @derekwaynecarr was. This started to happen when we upgraded to Docker 1.0 so it might be related to that.

Besides, I think some kind of retry/timeout logic will be useful for us to prevent having jobs that are stuck in Jenkins for >1d (and killing them manually)

smarterclayton commented 10 years ago

I agree we should be timing out, but we don't need that in the scripts. That's a Jenkins type repo.

We need spend the time to figure out what is going on here. Retry is a bad bandaid