openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

Missing documentation on how to redeploy broken registry. #10585

Open Firstyear opened 8 years ago

Firstyear commented 8 years ago

Missing documentation on how to redeploy broken registry. An openshift install shows the following output.

oc status svc/docker-registry - 172.30.61.89:5000 dc/docker-registry deploys registry.access.redhat.com/openshift3/ose-docker-registry:v1.2.1 deployment #1 failed 2 hours ago

But the registry says it's working

openshift admin registry Docker registry "docker-registry" service exists

oc get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE docker-registry 172.30.61.89 5000/TCP 2h kubernetes 172.30.0.1 443/TCP,53/UDP,53/TCP 4d router 172.30.48.169 80/TCP,443/TCP,1936/TCP 1h

There is not documentation about how to redeploy or rebuild a broken registry. This is causing new container builds to fail.

Version

oc v1.2.1 kubernetes v1.2.0-36-g4a3f9c5

Additional Information

[Note] Running diagnostic: ClusterRegistry Description: Check that there is a working Docker registry

ERROR: [DClu1006 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:203] The "docker-registry" service exists but has no associated pods, so it is not available. Builds and deployments that use the registry will fail.

ERROR: [DClu1001 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:173] The "docker-registry" service exists but no pods currently running, so it is not available. Builds and deployments that use the registry will fail.

201sandeep commented 7 years ago

yes, even i am facing the same issue, can someone please help

gregswift commented 7 years ago

fwiw i had the same or similar issue... I dont recall what lead to it but here is how i solved it.

Deleted everything related to the docker-registry deployment. The service, router, deployments (all of the failed ones), and the service. Then I ran

oc deploy docker-registry --latest -n default

That errored because of a few existing things, primarily the service account, but the registry itself came up.

miminar commented 7 years ago

A note to myself for documenting this.

A deployment config needs to be re-created after service. Otherwise, the registry pod will lack environment variables

${DOCKER_REGISTRY_SERVICE_HOST}
${DOCKER_REGISTRY_SERVICE_PORT}

To test it:

$ oc rsh dc/docker-registry bash -c 'echo ${DOCKER_REGISTRY_SERVICE_HOST}:${DOCKER_REGISTRY_SERVICE_PORT}'
172.30.30.30:5000
# If a pod is started before the service exists, it will look like this
$ oc rsh dc/docker-registry bash -c 'echo ${DOCKER_REGISTRY_SERVICE_HOST}:${DOCKER_REGISTRY_SERVICE_PORT}'
:

If undefined, DOCKER_REGISTRY_URL will be empty, causing following problems:

Pushing image 172.30.91.135:5000/haowang/ruby-ex:latest ...
Pushed 4/5 layers, 82% complete
Pushed 5/5 layers, 100% complete
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount@example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: received unexpected HTTP status: 500 Internal Server Error

$ oc logs -f dc/docker-registry
...
time="2017-02-14T08:57:23.804381606Z" level=error msg="error creating ImageStreamMapping: ImageStreamMapping \"ruby-ex\" is invalid: image.dockerImageReference: Invalid value: \":/zhouy/ruby-ex@sha256:79884cc0d892dd8096d3f7ca9b2484045c5210ef0e488755ce4b635f231f809a\": invalid reference format" go.version=go1.7.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v1+prettyjws" http.request.host="172.30.91.135:5000" http.request.id=d49a6588-c7b4-4426-bf17-8933dbef9780 http.request.method=PUT http.request.remoteaddr="10.129.0.1:51862" http.request.uri="/v2/zhouy/ruby-ex/manifests/latest"
time="2017-02-14T08:57:23.804494035Z" level=error msg="response completed with error" err.code=unknown err.detail="ImageStreamMapping \"ruby-ex\" is invalid: image.dockerImageReference: Invalid value: \":/zhouy/ruby-ex@sha256:79884cc0d892dd8096d3f7ca9b2484045c5210ef0e488755ce4b635f231f809a\": invalid reference format" err.message="unknown error" go.version=go1.7.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v1+prettyjws" http.request.host="172.30.91.135:5000" http.request.id=d49a6588-c7b4-4426-bf17-8933dbef9780 http.request.method=PUT http.request.remoteaddr="10.129.0.1:51862" http.request.uri="/v2/zhouy/ruby-ex/manifests/latest"
...

Update: since 3.9, the following will be printed if the variables aren't set:

level=fatal msg="error parsing configuration file: configuration error in openshift.server.addr: REGISTRY_OPENSHIFT_SERVER_ADDR variable must be set when running outside of Kubernetes cluster"
201sandeep commented 7 years ago

I got it resolved by modifying below line in YAML playbook and re-compile the OSO.

[nodes] master.example.com openshift_node_labels=" {'region':'infra','zone':'default'}" openshift_schedulable=true

openshift_schedulable=true is the important parameter which is not letting the registry pod spawn (if sets to "false" ) in case you have single node in infra region.

Once done, it took couple of mins to make my OSO registry working.

Regards,Sandeep

walidshaari commented 7 years ago

What works for me is the following oc rollout latest docker-registry

davistran86 commented 7 years ago

@walidshaari thanks, you helped me out, your command worked for me 👍

openshift-bot commented 6 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Caplost commented 6 years ago

@walidshaari THX, worked for me

miminar commented 6 years ago

Fix: https://github.com/openshift/openshift-docs/pull/8666

dmage commented 6 years ago

/remove-lifecycle stale

miminar commented 6 years ago

The doc work I started needs a rewrite. But the registry operator will change things in a way that all guidance will become obsolete. This could be resolved with a documentation for the operator.

openshift-merge-robot commented 6 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 5 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

Firstyear commented 5 years ago

/lifecycle rotten /remove-lifecycle stale

Firstyear commented 5 years ago

/lifecycle frozen