Closed matzew closed 4 years ago
Also, I noticed (due to the failure) a ton of "dh-strimzi-apb-prov-XXXXX" projects are created, all the same error. I've never seen that, when a provison fails, that a ton of "retry"? projects have been created.
See #1010, this is related to the orphan mitigation in service-catalog.
Looking at your APBs logs, connection refused
, makes me think that something is wrong|missing with the inter-pod networking in your cluster.
$ oc cluster up --enable=service-catalog,template-service-broker,router,registry,web-console,persistent-volumes,sample-templates,rhel-imagestreams
$ kubectl apply -f https://raw.githubusercontent.com/project-streamzi/ocp-broker/ASB_12_oc310/install.yaml
Then ran your strimzi-apb without issue:
PLAY [strimzi-apb playbook to provision the application] ***********************
--
| TASK [ansible.kubernetes-modules : Install latest openshift client] ************
| skipping: [localhost]
| TASK [ansibleplaybookbundle.asb-modules : debug] *******************************
| skipping: [localhost]
| TASK [provision-strimzi-apb : Login As Super User] *****************************
| changed: [localhost]
| TASK [provision-strimzi-apb : Create Cluster Operator Service Account yaml] ****
| changed: [localhost]
| TASK [provision-strimzi-apb : Create Cluster operator Service Account] *********
| changed: [localhost]
| TASK [provision-strimzi-apb : Delete Cluster Operator Template File] ***********
| changed: [localhost]
| TASK [provision-strimzi-apb : Create Role] *************************************
| changed: [localhost]
| TASK [provision-strimzi-apb : Create Role Based Access Control] ****************
| changed: [localhost]
| TASK [provision-strimzi-apb : Create k8s deployment] ***************************
| changed: [localhost]
| TASK [provision-strimzi-apb : Create Persistant Storage template] **************
| changed: [localhost]
| TASK [provision-strimzi-apb : Deploy a ZK and Kafka cluster] *******************
| changed: [localhost]
| TASK [provision-strimzi-apb : Wait for Strimzi topic Operator to become ready] ***
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (40 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (39 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (38 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (37 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (36 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (35 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (34 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (33 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (32 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (31 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (30 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (29 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (28 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (27 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (26 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (25 retries left).
| FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (24 retries left).
| changed: [localhost]
| PLAY RECAP *********************************************************************
| localhost : ok=10 changed=10 unreachable=0 failed=0
ok, cool!
will check tomorrow
On Fri 17. Aug 2018 at 16:07, David Zager notifications@github.com wrote:
Also, I noticed (due to the failure) a ton of "dh-strimzi-apb-prov-XXXXX" projects are created, all the same error. I've never seen that, when a provison fails, that a ton of "retry"? projects have been created.
See #1010 https://github.com/openshift/ansible-service-broker/issues/1010, this is related to the orphan mitigation in service-catalog.
Looking at your APBs logs, connection refused, makes me think that something is wrong|missing with the inter-pod networking in your cluster.
$ oc cluster up --enable=service-catalog,template-service-broker,router,registry,web-console,persistent-volumes,sample-templates,rhel-imagestreams $ kubectl apply -f https://raw.githubusercontent.com/project-streamzi/ocp-broker/ASB_12_oc310/install.yaml
Then ran your strimzi-apb without issue:
PLAY [strimzi-apb playbook to provision the application] ***
| TASK [ansible.kubernetes-modules : Install latest openshift client] **** | skipping: [localhost] | TASK [ansibleplaybookbundle.asb-modules : debug] ** | skipping: [localhost] | TASK [provision-strimzi-apb : Login As Super User] | changed: [localhost] | TASK [provision-strimzi-apb : Create Cluster Operator Service Account yaml] | changed: [localhost] | TASK [provision-strimzi-apb : Create Cluster operator Service Account] ** | changed: [localhost] | TASK [provision-strimzi-apb : Delete Cluster Operator Template File] | changed: [localhost] | TASK [provision-strimzi-apb : Create Role] *** | changed: [localhost] | TASK [provision-strimzi-apb : Create Role Based Access Control] **** | changed: [localhost] | TASK [provision-strimzi-apb : Create k8s deployment] * | changed: [localhost] | TASK [provision-strimzi-apb : Create Persistant Storage template] ** | changed: [localhost] | TASK [provision-strimzi-apb : Deploy a ZK and Kafka cluster] ***** | changed: [localhost] | TASK [provision-strimzi-apb : Wait for Strimzi topic Operator to become ready] * | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (40 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (39 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (38 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (37 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (36 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (35 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (34 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (33 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (32 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (31 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (30 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (29 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (28 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (27 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (26 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (25 retries left). | FAILED - RETRYING: Wait for Strimzi topic Operator to become ready (24 retries left). | changed: [localhost] | PLAY RECAP ***** | localhost : ok=10 changed=10 unreachable=0 failed=0
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openshift/ansible-service-broker/issues/1049#issuecomment-413877281, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJnztYB5hb75YI5D-wOA_QlmdrBjmhCks5uRs4dgaJpZM4WBNDc .
-- Sent from Gmail Mobile
hrm...
getting
| TASK [provision-strimzi-apb : Login As Super User] *****************************
-- | --
| fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["oc", "login", "-u", "developer", "-p", "developer"], "delta": "0:00:00.592034", "end": "2018-08-20 08:16:33.348767", "msg": "non-zero return code", "rc": 1, "start": "2018-08-20 08:16:32.756733", "stderr": "error: dial tcp 127.0.0.1:8443: getsockopt: connection refused", "stderr_lines": ["error: dial tcp 127.0.0.1:8443: getsockopt: connection refused"], "stdout": "", "stdout_lines": []}
| PLAY RECAP *********************************************************************
| localhost
still an issue for me
I re-ran all the things, and I also got it working now
@djzager thanks for your help, dude!
Hi, I'm also seeing a similar error deploying Strimzi:
TASK [provision-strimzi-apb : Login As Super User] *****************************
| fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["oc", "login", "-u", "developer", "-p", "de"], "delta": "0:00:00.349035", "end": "2018-08-20 13:34:11.415052", "msg": "non-zero return code", "rc": 1, "start": "2018-08-20 13:34:11.066017", "stderr": "error: dial tcp 127.0.0.1:8443: getsockopt: connection refused", "stderr_lines": ["error: dial tcp 127.0.0.1:8443: getsockopt: connection refused"], "stdout": "", "stdout_lines": []}
I'm on Mac using the following Docker version:
Client:
Version: 17.09.1-ce
API version: 1.32
Go version: go1.8.3
Git commit: 19e2cf6
Built: Thu Dec 7 22:22:25 2017
OS/Arch: darwin/amd64
Server:
Version: 17.09.1-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: 19e2cf6
Built: Thu Dec 7 22:28:28 2017
OS/Arch: linux/amd64
Experimental: true
and oc
oc v3.10.0+dd10d17
kubernetes v1.10.0+b81c8f8
features: Basic-Auth
Server https://127.0.0.1:8443
openshift v3.10.0+20c7bd1-8
kubernetes v1.10.0+b81c8f8
Is it possible @sjwoodman that your cluster was previously started without enabling the router? I noticed that your project's install script does have an oc cluster up
but it would simply skip this step if a cluster had already been started. The router not being enabled is the only thing that makes sense to be giving the connection refused
based on what I see in the issue.
You should also consider doing a docker system prune -a -f
to make sure you don't have any stale origin images impacting you negatively.
Hi David, I've tried a docker system prune -a -f
but see the same behaviour. In terms of the state of OpenShift it's a clean install as I removed the openshift.cluster.local
directory between each attempt. Are there any logs that you would suggest looking at to diagnose further?
@djzager So, this all runs fine on my Fedora - but not on mac.
We start oc cluster up
with --routing-suffix=${ROUTING_SUFFIX} --public-hostname=${PUBLIC_IP}
. All fine on Linux, but on Mac we get:
`fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["oc", "login", "-u", "developer", "-p", "d"], "delta": "0:00:00.296468", "end": "2018-09-17 09:57:04.381026", "msg": "non-zero return code", "rc": 1, "start": "2018-09-17 09:57:04.084558", "stderr": "error: dial tcp 127.0.0.1:8443: getsockopt: connection refused", "stderr_lines": ["error: dial tcp 127.0.0.1:8443: getsockopt: connection refused"], "stdout": "", "stdout_lines": []}` (edited)
I realized that instead of the broker-apb, the template seems to have a ROUTING_SUFFIX
parameter: https://github.com/openshift/ansible-service-broker/blob/master/templates/deploy-ansible-service-broker.template.yaml#L390-L391
Is there an equivialent for that in the APB ?
here is our customized config, pointing to the 1.2
image:
https://github.com/project-streamzi/ocp-broker/blob/clean_up/install.yaml#L40
@djzager wondering is there anyone in your team that uses a mac for development? So he could try to execute our script ?
any comment @djzager ?
I'm going to add @jmontleon to this as I'm not really in a position to be helpful (paternity leave) at the moment.
@matzew we were actually doing some investigation on a separate issue https://github.com/openshift/origin/issues/20991 for the same error connecting to the public hostname.
I think from what we could see when you set --public-hostname it doesn't work correctly on Mac. @jwmatthews was experimenting on his Mac and was able to reproduce the issue. I think he mentioned socat is used by oc to allow the connection to work when you don't use --public-hostname and that it may not be setting up the relay or not setting it up properly.
To work around it don't use the --public-hostname (and possibly also --routing-suffix) option on Mac.
Thanks @djzager - enjoy your time off!
@jmontleon thanks, and you are right with what you say and what is in #20991 but they are separate issues. On a Mac with OpenShift 3.10 if you set --public-hostname
and --routing-suffix
OpenShift will not startup - it fails with a timeout.
However, if you do not set those parameters OpenShift will boot but APBs will not work (replicated on two different Macs). The failure is as @matzew listed (from the APB not OpenShift itself):
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["oc", "login", "-u", "developer", "-p", "d"], "delta": "0:00:00.296468", "end": "2018-09-17 09:57:04.381026", "msg": "non-zero return code", "rc": 1, "start": "2018-09-17 09:57:04.084558", "stderr": "error: dial tcp 127.0.0.1:8443: getsockopt: connection refused", "stderr_lines": ["error: dial tcp 127.0.0.1:8443: getsockopt: connection refused"], "stdout": "", "stdout_lines": []}
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/close
@jmrodri: Closing this issue.
Bug:
using the 3.10 CLI, and running
oc cluster up --enable=service-catalog,web-console
I get Openshift.Than I install the Automation broker, doing:
Which basically contains commit
313572af9d865f4ca5167c5342cffb37ec798179
from @djzager AND I also provide thebroker_dockerhub_org
argument.This brings up the catalog w/ my APBs -> :tada: (Therefore I am closing #1041)
However, now: running an APB does not work.
What happened:
Here is an example of the failure that occured:
Also, I noticed (due to the failure) a ton of "dh-strimzi-apb-prov-XXXXX" projects are created, all the same error. I've never seen that, when a provison fails, that a ton of "retry"? projects have been created.
Also, on the UI, I noticed something like:
What you expected to happen:
APB runs smoothless with the 1.2 release
How to reproduce it:
Run install Openshift:
Install the ASB: