openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.17k stars 2.32k forks source link

x509: certificate signed by unknown authority #3784

Closed rahul334481 closed 7 years ago

rahul334481 commented 7 years ago

Hi Team,

Trying to implement OpenShift on 3 masters/2 nodes/3 etcds/1 LB

Ansible Script looks like below:

[OSEv3:children]
masters
nodes
etcd
lb

[OSEv3:vars]
ansible_ssh_user=root
deployment_type=origin
openshift_master_cluster_method=native
openshift_master_cluster_hostname=oc-master.domain.com
openshift_master_cluster_public_hostname=oc-master.domain.com
openshift_master_default_subdomain=apps.oc-master.domain.com

openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_host=isilon01.domain.com
openshift_hosted_registry_storage_nfs_directory=/ifs/data/production/openshift
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=20Gi

openshift_hosted_metrics_deploy=true
openshift_hosted_metrics_storage_kind=nfs
openshift_hosted_metrics_storage_access_modes=['ReadWriteOnce']
openshift_hosted_metrics_storage_host=isilon01.domain.com
openshift_hosted_metrics_storage_nfs_directory=/ifs/data/production/openshift
openshift_hosted_metrics_storage_volume_name=metrics

openshift_hosted_logging_deploy=true
openshift_hosted_logging_storage_kind=nfs
openshift_hosted_logging_storage_access_modes=['ReadWriteOnce']
openshift_hosted_logging_storage_host=isilon01.domain.com
openshift_hosted_logging_storage_nfs_directory=/ifs/data/production/openshift
openshift_hosted_logging_storage_volume_name=logging

openshift_master_api_port=8443
openshift_master_console_port=8443

openshift_node_iptables_sync_period=5s

logrotate_scripts=[{"name": "syslog", "path": "/var/log/cron\n/var/log/maillog\n/var/log/messages\n/var/log/secure\n/var/log/spooler\n", "options": ["daily", "rotate 7", "compress", "sharedscripts", "missingok"], "scripts": {"postrotate": "/bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true"}}]

openshift_clock_enabled=true

[masters]
oc-master[1:3].domain.com

[etcd]
oc-etcd[1:3].domain.com

[lb]
oc-master.domain.com #containerized=false

[nodes]
oc-master[1:3].domain.com
oc-node[1:2].domainm ##openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

note: tbielawa edited above block to use code styling

ansible --version ansible 2.2.1.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides

Error while running ansible:

TASK [openshift_examples : Import Centos Image streams] ************************
fatal: [oc-master1.domain.com]: FAILED! => {"changed": false, "cmd": ["oc", "create", "-n", "openshift", "-f", "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"], "delta": "0:00:00.357048", "end": "2017-03-28 12:55:13.441164", "failed": true, "failed_when_result": true, "rc": 1, "start": "2017-03-28 12:55:13.084116", "stderr": "Error from server: Get https://oc-master.domain.com:8443/api/v1/namespaces/openshift/resourcequotas: x509: certificate signed by unknown authority\nError from server: Get https://oc-master.domain.com:8443/api/v1/namespaces/openshift/resourcequotas: x509: certificate signed by unknown authority\nError from server: Get https://oc-master.domain.com:8443:8443/api/v1/namespaces/openshift/resourcequotas: x509: certificate signed by unknown authority\nError from server: 
...
...
x509: certificate signed by unknown authority", "stdout": "", "stdout_lines": [], "warnings": []}
    to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************
localhost                  : ok=10   changed=0    unreachable=0    failed=0   
oc-etcd1.domain.com      : ok=90   changed=1    unreachable=0    failed=0   
oc-etcd2.domain.com      : ok=82   changed=1    unreachable=0    failed=0   
oc-etcd3.domain.com      : ok=82   changed=1    unreachable=0    failed=0   
oc-master.domain.com     : ok=70   changed=0    unreachable=0    failed=0   
oc-master1.domain.com    : ok=299  changed=13   unreachable=0    failed=1   
oc-master2.domain.com    : ok=245  changed=9    unreachable=0    failed=0   
oc-master3.domain.com    : ok=245  changed=9    unreachable=0    failed=0   
oc-node1.domain.com      : ok=112  changed=2    unreachable=0    failed=1   
oc-node2.domain.com      : ok=112  changed=2    unreachable=0    failed=1
abutcher commented 7 years ago

Hey @rahul334481, could you check the current master serving certificate to ensure that it was signed by the current CA? Also I'm curious if there were any issues encountered with install that would have resulted in these getting out of sync with one another OR if this issue is reproducible starting with fresh VMs?

openssl verify -CAfile /etc/origin/master/ca.crt /etc/origin/master/master.server.crt
openssl verify -CAfile /etc/origin/master/ca-bundle.crt /etc/origin/master/master.server.crt
rahul334481 commented 7 years ago

Hi @abutcher Thanks for the update. I checked and looks good:

[root@oc-master1 ~]# openssl verify -CAfile /etc/origin/master/ca.crt /etc/origin/master/master.server.crt
/etc/origin/master/master.server.crt: OK

[root@oc-master1 ~]# openssl verify -CAfile /etc/origin/master/ca-bundle.crt /etc/origin/master/master.server.crt
/etc/origin/master/master.server.crt: OK

Current oc status has below error:

[root@oc-master1 ~]# oc status
Error from server: Get https://oc-master.domain.com:8443/api/v1/namespaces/default: x509: certificate signed by unknown authority

Thanks, Rahul

abutcher commented 7 years ago

Hmmm. Let's check the kubeconfig then. Here are some ugly one liners we can use to compare the md5sums of the base 64 encoded CA data.

/root/.kube/config (default for cli operations) and /etc/origin/master/admin.kubeconfig should match.

grep certificate-authority-data /etc/origin/master/admin.kubeconfig | awk '{ print $2 }' | base64 -d | md5sum
grep certificate-authority-data /root/.kube/config | awk '{ print $2 }' | base64 -d | md5sum
md5sum /etc/origin/master/ca-bundle.crt

Or to just examine the CA data in the kubeconfig, which we could then compare with the current bundle:

grep certificate-authority-data /etc/origin/master/admin.kubeconfig | awk '{ print $2 }' | base64 -d | openssl x509 -noout -text
rahul334481 commented 7 years ago

Hi @abutcher

There are diff in /root/.kube/config and /etc/origin/master/admin.kubeconfig

root@oc-master1 ~]# diff /root/.kube/config /etc/origin/master/admin.kubeconfig 
5,6c5,6
<     server: https://10.1.15.26:8443
<   name: 10-1-15-26:8443
---
>     server: https://oc-master.domain.com:8443
>   name: oc-master-domain-com:8443
9c9
<     cluster: 10-1-15-26:8443
---
>     cluster: oc-master-domain-com:8443
11,13c11,13
<     user: system:admin/10-1-15-26:8443
<   name: default/10-1-15-26:8443/system:admin
< current-context: default/10-1-15-26:8443/system:admin
---
>     user: system:admin/oc-master-domain-com:8443
>   name: default/oc-master-domain-com:8443/system:admin
> current-context: default/oc-master-domain-com:8443/system:admin
17c17
< - name: system:admin/10-1-15-26:8443
---
> - name: system:admin/oc-master-domain-com:8443

Please find output of next 3 commands:

[root@oc-master1 ~]# grep certificate-authority-data /etc/origin/master/admin.kubeconfig | awk '{ print $2 }' | base64 -d | md5sum
fb33bfae97375e964fc5c75aec16a894  -
[root@oc-master1 ~]# grep certificate-authority-data /root/.kube/config | awk '{ print $2 }' | base64 -d | md5sum
fb33bfae97375e964fc5c75aec16a894  -
[root@oc-master1 ~]# md5sum /etc/origin/master/ca-bundle.crt
fb33bfae97375e964fc5c75aec16a894  /etc/origin/master/ca-bundle.crt

Besides, do we need to enter this in ansible file? openshift_master_default_subdomain=apps.oc-master.domain.com

Thanks, Rahul

abutcher commented 7 years ago

Okay, all of the CA data in the kubeconfigs match and we also know that the master's serving certificate was signed by that CA. We could also ensure that the cert being served by the master checks out:

openssl s_client -CAfile /etc/origin/master/ca-bundle.crt -connect oc-master.domain.com:8443

What happens if we specify the admin kubeconfig rather than using the default, since the root kubeconfig is pointing to an IP address. I doubt this will work since the installer operation would have used the admin kubeconfig for importing imagesteams.

oc status --config=/etc/origin/master/admin.kubeconfig

openshift_master_default_subdomain sets the default subdomain used for created routes and wouldn't interfere with what we're verifying at the moment.

rahul334481 commented 7 years ago

unsuccessful:

[root@oc-master1 ~]# openssl s_client -CAfile /etc/origin/master/ca-bundle.crt -connect oc-master.domain.com:8443
socket: Connection refused
connect:errno=111

[root@oc-master1 ~]# oc status --config=/etc/origin/master/admin.kubeconfig
Unable to connect to the server: x509: certificate signed by unknown authority
[root@oc-master1 ~]# 
rahul334481 commented 7 years ago

Please disregard last output. Getting new one

rahul334481 commented 7 years ago

This command took time and once i hit enter, got prompt with 400 Bad Requestclosed

[root@oc-master1 ~]# openssl s_client -CAfile /etc/origin/master/ca-bundle.crt -connect oc-master.domain.com:8443
CONNECTED(00000003)
depth=1 CN = openshift-signer@1490641060
verify error:num=19:self signed certificate in certificate chain
verify return:0
---
Certificate chain
 0 s:/CN=10.1.15.26
   i:/CN=openshift-signer@1490641060
 1 s:/CN=openshift-signer@1490641060
   i:/CN=openshift-signer@1490641060
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEozCCA4ugAwIBAgIBDTANBgkqhkiG9w0BAQsFADAmMSQwIgYDVQQDDBtvcGVu
...
...
TsW1HNdBavvjPEBL2HMzMznJiyD1ejWW1I3d8lWnJsah/aHR6QdjcSlY4NWn/aFI
ryyGE9nzlCH7phzpbu7JnN2eesbp9+iIzymWeEM/kQizXyKQE78C
-----END CERTIFICATE-----
subject=/CN=10.1.15.26
issuer=/CN=openshift-signer@1490641060
---
Acceptable client certificate CA names
/CN=openshift-signer@1490641060
Server Temp Key: ECDH, prime256v1, 256 bits
---
SSL handshake has read 2620 bytes and written 385 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: DB825253ECF48162B8EF6CBE8A084C0F5A196DD1E547AE71647A5F6EAABA385D
    Session-ID-ctx: 
    Master-Key: 92B1BC67690E21D0DFCC88FCA8B7EFB1358C2C60E3062C10B3759A232D2226A3B4C96CD28748669E6A5721104E6DCA7D
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    TLS session ticket:
    0000 - ae eb fc 4e fb df 51 a1-19 5c cc 70 a1 4b a9 f6   ...N..Q..\.p.K..
    0010 - 7c 3d 00 98 dc d1 80 64-45 66 86 f3 18 82 a5 33   |=.....dEf.....3
    0020 - 66 c7 78 47 92 9c f8 65-f8 c2 34 27 a9 e0 cb 21   f.xG...e..4'...!
    0030 - f0 47 6f 18 f4 8b fc 11-50 51 aa 06 2b e2 95 bb   .Go.....PQ..+...
    0040 - 7c c5 ba 74 cc 79 d2 3f-0c 86 4d ff 40 ad 5a ac   |..t.y.?..M.@.Z.
    0050 - 60 c5 ea 4f 39 5c 1b c1-56 24 5a 54 d7 34 95 ee   `..O9\..V$ZT.4..
    0060 - af 3d ef 6c d8 89 6c 38-8c 81 5e e8 16 a1 d6 f6   .=.l..l8..^.....
    0070 - a1 92 08 1f 26 db 48 66-                          ....&.Hf

    Start Time: 1490732648
    Timeout   : 300 (sec)
    Verify return code: 19 (self signed certificate in certificate chain)
---

HTTP/1.1 400 Bad Request
Content-Type: text/plain
Connection: close

400 Bad Requestclosed
[root@oc-master1 ~]# 

[root@oc-master1 ~]# oc status --config=/etc/origin/master/admin.kubeconfig
Unable to connect to the server: x509: certificate signed by unknown authority
rahul334481 commented 7 years ago

So if I wait for 300s (timeout), below is what i get:

    Start Time: 1490733126
    Timeout   : 300 (sec)
    Verify return code: 19 (self signed certificate in certificate chain)
---
read:errno=0
[root@oc-master1 ~]# 
abutcher commented 7 years ago

We would expect Verify return code: 0 (ok) from the previous section. Compare the issuer of the CA certificate with the issuer of the certificate returned from s_client. Is it issuer= /CN=openshift-signer@1490641060?

openssl x509 -noout -issuer -in /etc/origin/master/ca.crt

We should also verify that the CAs match on each master and that the master serving certificate per master can be verified with the CA as we checked before with openssl verify.

rahul334481 commented 7 years ago

Hi @abutcher,

Please find details as below:

==oc-master== <<<<< LB
Error opening Certificate /etc/origin/master/ca.crt
140711121414048:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('/etc/origin/master/ca.crt','r')
140711121414048:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:
unable to load certificate
==oc-master1==
issuer= /CN=openshift-signer@1490641040
==oc-master2==
issuer= /CN=openshift-signer@1490641060
==oc-master3==
issuer= /CN=openshift-signer@1490641092

Verification:

==oc-master.domain.com==
Error loading file /etc/origin/master/ca.crt
140005027325856:error:02001002:system library:fopen:No such file or directory:bss_file.c:169:fopen('/etc/origin/master/ca.crt','r')
140005027325856:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:172:
140005027325856:error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib:by_file.c:281:
usage: verify [-verbose] [-CApath path] [-CAfile file] [-trusted_first] [-purpose purpose] [-crl_check] [-attime timestamp] [-engine e] cert1 cert2 ...
recognized usages:
        sslclient       SSL client
        sslserver       SSL server
        nssslserver     Netscape SSL server
        smimesign       S/MIME signing
        smimeencrypt    S/MIME encryption
        crlsign         CRL signing
        any             Any Purpose
        ocsphelper      OCSP helper
        timestampsign   Time Stamp signing
==oc-master1.domain.com==
/etc/origin/master/master.server.crt: OK
==oc-master2.domain.com==
/etc/origin/master/master.server.crt: OK
==oc-master3.domain.com==
/etc/origin/master/master.server.crt: OK

==oc-master.domain.com==
Error loading file /etc/origin/master/ca-bundle.crt
140169011611552:error:02001002:system library:fopen:No such file or directory:bss_file.c:169:fopen('/etc/origin/master/ca-bundle.crt','r')
140169011611552:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:172:
140169011611552:error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib:by_file.c:281:
usage: verify [-verbose] [-CApath path] [-CAfile file] [-trusted_first] [-purpose purpose] [-crl_check] [-attime timestamp] [-engine e] cert1 cert2 ...
recognized usages:
        sslclient       SSL client
        sslserver       SSL server
        nssslserver     Netscape SSL server
        smimesign       S/MIME signing
        smimeencrypt    S/MIME encryption
        crlsign         CRL signing
        any             Any Purpose
        ocsphelper      OCSP helper
        timestampsign   Time Stamp signing
==oc-master1.domain.com==
/etc/origin/master/master.server.crt: OK
==oc-master2.domain.com==
/etc/origin/master/master.server.crt: OK
==oc-master3.domain.com==
/etc/origin/master/master.server.crt: OK
rahul334481 commented 7 years ago

Hi @abutcher / Team,

Any insight is appreciated.

Current status:

==oc-master==  <<<<<< lb
bash: oc: command not found
==oc-master1==
Error from server: Get https://oc-master.domain.com:8443/api/v1/namespaces/default: x509: certificate signed by unknown authority
==oc-master2==
Error from server: Get https://oc-master.domain.com:8443/api/v1/namespaces/default: x509: certificate signed by unknown authority
==oc-master3==
Error from server: Get https://oc-master.domain.com:8443/api/v1/namespaces/default: x509: certificate signed by unknown authority

Thanks, Rahul

rahul334481 commented 7 years ago

Hi @abutcher,

Current error once I ran ansible again:

TASK [openshift_hosted_templates : Create or update hosted templates] **********
fatal: [oc-master1]: FAILED! => {"changed": false, "cmd": ["oc", "create", "-f", "/usr/share/openshift/hosted", "--config=/tmp/openshift-ansible-ul8RA0/admin.kubeconfig", "-n", "openshift"], "delta": "0:00:00.206521", "end": "2017-03-29 10:58:55.697675", "failed": true, "failed_when_result": true, "rc": 1, "start": "2017-03-29 10:58:55.491154", "stderr": "Unable to connect to the server: x509: certificate signed by unknown authority", "stdout": "", "stdout_lines": [], "warnings": []}
        to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************
localhost                  : ok=10   changed=0    unreachable=0    failed=0   
oc-etcd1      : ok=90   changed=1    unreachable=0    failed=0     
oc-etcd2      : ok=82   changed=1    unreachable=0    failed=0     
oc-etcd3      : ok=82   changed=1    unreachable=0    failed=0     
oc-master     : ok=70   changed=0    unreachable=0    failed=0     
oc-master1    : ok=314  changed=16   unreachable=0    failed=1     
oc-master2    : ok=245  changed=10   unreachable=0    failed=0     
oc-master3    : ok=245  changed=10   unreachable=0    failed=0     
oc-node1      : ok=112  changed=2    unreachable=0    failed=1     
oc-node2      : ok=112  changed=2    unreachable=0    failed=1     
abutcher commented 7 years ago

Hey @rahul334481, based on the CNs of those CA certificates it looks like there is a different CA certificate on each master. If this is the case then each master's serving certificate may have been signed by each individual master's CA certificate (rather than all certificates being signed by a single, common CA certificate), meaning that none of the masters can talk to one another. I'm really curious how this could have occurred.

The CA certificate on each master is identical when I configure an HA cluster using the master branch.

[root@master1 ~]# openssl x509 -noout -issuer -in /etc/origin/master/ca.crt
issuer= /CN=openshift-signer@1490734436
[root@master1 ~]# md5sum /etc/origin/master/ca.crt 
b9692b2d6f948547d70e7913c1e166aa  /etc/origin/master/ca.crt

[root@master2 ~]# openssl x509 -noout -issuer -in /etc/origin/master/ca.crt
issuer= /CN=openshift-signer@1490734436
[root@master2 ~]# md5sum /etc/origin/master/ca.crt
b9692b2d6f948547d70e7913c1e166aa  /etc/origin/master/ca.crt

[root@master3 ~]# openssl x509 -noout -issuer -in /etc/origin/master/ca.crt
issuer= /CN=openshift-signer@1490734436
[root@master3 ~]# md5sum /etc/origin/master/ca.crt 
b9692b2d6f948547d70e7913c1e166aa  /etc/origin/master/ca.crt

Are you using the master branch of openshift-ansible? If not, which version or branch?

What procedure was used to install this cluster? Curious if a single run of playbooks/byo/config.yml against this inventory resulted in this state or if the cluster was configured in pieces.

If this is a new cluster I would recommend running uninstall to clean up the mismatched certificates / configuration and then reinstall.

ansible-playbook -i <inventory> playbooks/adhoc/uninstall.yml

I'm on freenode IRC as abutcher if you want to reach out there.

rahul334481 commented 7 years ago

Hi @abutcher ,

I am using ``ansible 2.2.1.0 version

Which channel are you in IRC?

Thanks, Rahul

rahul334481 commented 7 years ago

Hi @abutcher

Thanks for the steps. I uninstalled and installed back again. Everything worked except one error as below:

TASK [openshift_metrics : fail] ************************************************
fatal: [oc-master1]: FAILED! => {"changed": false, "failed": true, "msg": "'keytool' is unavailable. Please install java-1.8.0-openjdk-headless on the control node"}
        to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0   
oc-etcd1      : ok=112  changed=33   unreachable=0    failed=0     
oc-etcd2      : ok=103  changed=27   unreachable=0    failed=0     
oc-etcd3      : ok=103  changed=27   unreachable=0    failed=0     
oc-master     : ok=75   changed=8    unreachable=0    failed=0     
oc-master1    : ok=551  changed=136  unreachable=0    failed=1     
oc-master2    : ok=337  changed=86   unreachable=0    failed=0     
oc-master3    : ok=337  changed=86   unreachable=0    failed=0     
oc-node1      : ok=195  changed=55   unreachable=0    failed=0     
oc-node2      : ok=195  changed=55   unreachable=0    failed=0     

You have mail in /var/spool/mail/root

Do you want me to run uninstall.yml, install java package on all nodes (lb, 3 masters, 2 nodes and 3 etcds) and run config.yml again?

Thanks, Rahul

rahul334481 commented 7 years ago

Besides, current NFS PVs are as below. If I modify storage from NFS side, how will it reflect changes on host?

[root@oc-master1 ~]# oc get pv
NAME              CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM                    REASON    AGE
logging-volume    10Gi       RWO           Retain          Available                                      6m
metrics-volume    10Gi       RWO           Retain          Available                                      6m
registry-volume   5Gi        RWX           Retain          Bound       default/registry-claim             6m
rahul334481 commented 7 years ago
TASK [openshift_logging : copy] ************************************************
task path: /root/openshift-ansible/roles/openshift_logging/tasks/generate_configmaps.yaml:9
An exception occurred during task execution. The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 126, in run
    res = self._execute()
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 443, in _execute
    self._task.post_validate(templar=templar)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/task.py", line 248, in post_validate
    super(Task, self).post_validate(templar)
  File "/usr/lib/python2.7/site-packages/ansible/playbook/base.py", line 373, in post_validate
    value = templar.template(getattr(self, name))
  File "/usr/lib/python2.7/site-packages/ansible/template/__init__.py", line 427, in template
    disable_lookups=disable_lookups,
  File "/usr/lib/python2.7/site-packages/ansible/template/__init__.py", line 383, in template
    disable_lookups=disable_lookups,
  File "/usr/lib/python2.7/site-packages/ansible/template/__init__.py", line 583, in do_template
    res = j2_concat(rf)
  File "<template>", line 9, in root
  File "/usr/lib/python2.7/site-packages/ansible/plugins/filter/core.py", line 198, in from_yaml
    return yaml.safe_load(data)
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 48, in construct_document
    for dummy in generator:
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 132, in construct_mapping
    "found unacceptable key (%s)" % exc, key_node.start_mark)
ConstructorError: while constructing a mapping
  in "<unicode string>", line 9, column 21:
      number_of_shards: {{ es_number_of_shards | default ... 
                        ^
found unacceptable key (unhashable type: 'dict')
  in "<unicode string>", line 9, column 22:
      number_of_shards: {{ es_number_of_shards | default  ... 
                         ^

fatal: [oc-master1]: FAILED! => { 
    "failed": true
}

MSG:

Unexpected failure during module execution.
abutcher commented 7 years ago

Noting that the metrics and logging issues encountered are addressed by https://github.com/openshift/openshift-ansible/pull/3770 and https://github.com/openshift/openshift-ansible/pull/3792.

rahul334481 commented 7 years ago

Hi @abutcher As per verification steps, running below command from master1 node but it gives error:

etcdctl -C https://oc-etcd1:2379,https://oc-etcd2:2379,https://oc-etcd3:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
-bash: etcdctl: command not found

http://<lb_hostname>:9000
abutcher commented 7 years ago

Hey @rahul334481, you'll need to install the etcd package on the master to provide etcdctl.

rahul334481 commented 7 years ago

Thanks @abutcher

That worked.

As per documentation, if I install multiple masters (which I did) using HAProxy as Load balancer, i need to browse http://<lb_hostname>:9000 and check HAProxy's status but for some reason browser page times out.

Besides, I gave cluster admin priv to a user but for some reason user is unable to run "oc" commands:

[root@oc-master1 ~]# oc adm policy add-cluster-role-to-user cluster-admin ragarwal
-bash-4.2$ oc get nodes
error: Missing or incomplete configuration info.  Please login or point to an existing, complete config file:

  1. Via the command-line flag --config
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

To view or setup config directly use the 'config' command.
-bash-4.2$ 
rahul334481 commented 7 years ago

Hi @abutcher

Good Morning,

For some reason, Metrics URL is not working: https://hawkular-metrics.apps.oc-master.domain.com/hawkular/metrics

Error: 503 Service Unavailable

No server is available to handle this request.

rahul334481 commented 7 years ago
[root@oc-master1 ~]# oc logs hawkular-cassandra-1-4cns3 -n openshift-infra
The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (2000000000).
The memory limit is less than 2GB. Using 1/2 of available memory for the max_heap_size.
The MAX_HEAP_SIZE has been set to 953M
THE HEAP_NEWSIZE envar is not set. Setting to 800M based on the CPU_LIMIT of 8000. [100M per CPU core]
About to generate seeds
Trying to access the Seed list [try #1]
Trying to access the Seed list [try #2]
Trying to access the Seed list [try #3]
Setting seeds to be hawkular-cassandra-1-4cns3
Creating the Cassandra keystore from the Secret's cert data
Converting the PKCS12 keystore into a Java Keystore
Entry for alias cassandra successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled
[Storing /opt/apache-cassandra/conf/.keystore]
Building the trust store for inter node communication
Certificate was added to keystore
Certificate was added to keystore
Building the trust store for client communication
Certificate was added to keystore
Certificate was added to keystore
Generating self signed certificates for the local client for cqlsh
Generating a 4096 bit RSA private key
..........................................................................++
...........................................................................................................................................................................................................................................++
writing new private key to '.cassandra.local.client.key'
-----
Certificate was added to keystore
cat: /etc/ld.so.conf.d/*.conf: No such file or directory
getopt: invalid option -- 'R'
OpenJDK 64-Bit Server VM warning: Cannot open file /opt/apache-cassandra/logs/gc.log due to No such file or directory
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset (Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompilerOracle: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize (Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;I)I
CompilerOracle: dontinline org/apache/cassandra/db/transform/BaseIterator.tryGetMoreContents ()Z
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stop ()V
CompilerOracle: dontinline org/apache/cassandra/db/transform/StoppingTransformation.stopInPartition ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.doFlush (I)V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeExcessSlow ()V
CompilerOracle: dontinline org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.writeSlow (JI)V
CompilerOracle: dontinline org/apache/cassandra/io/util/RebufferingInputStream.readPrimitiveSlowly (I)J
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/vint/VIntCoding.encodeVInt (JI)[B
INFO  [main] 2017-03-31 14:53:26,777 YamlConfigurationLoader.java:85 - Configuration location: file:/opt/apache-cassandra-3.0.12-1/conf/cassandra.yaml
INFO  [main] 2017-03-31 14:53:26,893 Config.java:451 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; auto_snapshot=true; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; client_encryption_options=<REDACTED>; cluster_name=hawkular-metrics; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=LZ4Compressor; commitlog_directory=/cassandra_data/commitlog; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=null; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@67e2d983; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=true; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=SimpleSnitch; file_cache_size_in_mb=512; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=null; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=all; internode_recv_buff_size_in_bytes=null; internode_send_buff_size_in_bytes=null; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=hawkular-cassandra-1-4cns3; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=heap_buffers; memtable_cleanup_threshold=null; memtable_flush_writers=null; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=TIMEHORIZON; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=hawkular-cassandra-1-4cns3; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=null; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=hawkular-cassandra-1-4cns3}; server_encryption_options=<REDACTED>; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=2000]
INFO  [main] 2017-03-31 14:53:26,894 DatabaseDescriptor.java:323 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  [main] 2017-03-31 14:53:27,008 DatabaseDescriptor.java:430 - Global memtable on-heap threshold is enabled at 218MB
INFO  [main] 2017-03-31 14:53:27,008 DatabaseDescriptor.java:434 - Global memtable off-heap threshold is enabled at 218MB
INFO  [main] 2017-03-31 14:53:27,116 CassandraDaemon.java:434 - Hostname: hawkular-cassandra-1-4cns3
INFO  [main] 2017-03-31 14:53:27,117 CassandraDaemon.java:441 - JVM vendor/version: OpenJDK 64-Bit Server VM/1.8.0_121
INFO  [main] 2017-03-31 14:53:27,117 CassandraDaemon.java:442 - Heap size: 916455424/916455424
INFO  [main] 2017-03-31 14:53:27,117 CassandraDaemon.java:445 - Code Cache Non-heap memory: init = 2555904(2496K) used = 3596224(3511K) committed = 3604480(3520K) max = 251658240(245760K)
INFO  [main] 2017-03-31 14:53:27,117 CassandraDaemon.java:445 - Metaspace Non-heap memory: init = 0(0K) used = 16394024(16009K) committed = 16777216(16384K) max = -1(-1K)
INFO  [main] 2017-03-31 14:53:27,118 CassandraDaemon.java:445 - Compressed Class Space Non-heap memory: init = 0(0K) used = 1901488(1856K) committed = 2097152(2048K) max = 1073741824(1048576K)
INFO  [main] 2017-03-31 14:53:27,118 CassandraDaemon.java:445 - Par Eden Space Heap memory: init = 671088640(655360K) used = 174483896(170394K) committed = 671088640(655360K) max = 671088640(655360K)
INFO  [main] 2017-03-31 14:53:27,118 CassandraDaemon.java:445 - Par Survivor Space Heap memory: init = 83886080(81920K) used = 0(0K) committed = 83886080(81920K) max = 83886080(81920K)
INFO  [main] 2017-03-31 14:53:27,118 CassandraDaemon.java:445 - CMS Old Gen Heap memory: init = 161480704(157696K) used = 0(0K) committed = 161480704(157696K) max = 161480704(157696K)
INFO  [main] 2017-03-31 14:53:27,118 CassandraDaemon.java:447 - Classpath: /opt/apache-cassandra/conf:/opt/apache-cassandra/build/classes/main:/opt/apache-cassandra/build/classes/thrift:/opt/apache-cassandra/lib/ST4-4.0.8.jar:/opt/apache-cassandra/lib/airline-0.6.jar:/opt/apache-cassandra/lib/antlr-runtime-3.5.2.jar:/opt/apache-cassandra/lib/apache-cassandra-3.0.12-1.jar:/opt/apache-cassandra/lib/apache-cassandra-clientutil-3.0.12-1.jar:/opt/apache-cassandra/lib/apache-cassandra-thrift-3.0.12-1.jar:/opt/apache-cassandra/lib/asm-5.0.4.jar:/opt/apache-cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/opt/apache-cassandra/lib/commons-cli-1.1.jar:/opt/apache-cassandra/lib/commons-codec-1.2.jar:/opt/apache-cassandra/lib/commons-lang3-3.1.jar:/opt/apache-cassandra/lib/commons-math3-3.2.jar:/opt/apache-cassandra/lib/compress-lzf-0.8.4.jar:/opt/apache-cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/apache-cassandra/lib/disruptor-3.0.1.jar:/opt/apache-cassandra/lib/ecj-4.4.2.jar:/opt/apache-cassandra/lib/guava-18.0.jar:/opt/apache-cassandra/lib/high-scale-lib-1.0.6.jar:/opt/apache-cassandra/lib/jackson-core-asl-1.9.2.jar:/opt/apache-cassandra/lib/jackson-mapper-asl-1.9.2.jar:/opt/apache-cassandra/lib/jamm-0.3.0.jar:/opt/apache-cassandra/lib/javax.inject.jar:/opt/apache-cassandra/lib/jbcrypt-0.3m.jar:/opt/apache-cassandra/lib/jcl-over-slf4j-1.7.7.jar:/opt/apache-cassandra/lib/jna-4.0.0.jar:/opt/apache-cassandra/lib/joda-time-2.4.jar:/opt/apache-cassandra/lib/json-simple-1.1.jar:/opt/apache-cassandra/lib/jstackjunit-0.0.1.jar:/opt/apache-cassandra/lib/libthrift-0.9.2.jar:/opt/apache-cassandra/lib/log4j-over-slf4j-1.7.7.jar:/opt/apache-cassandra/lib/logback-classic-1.1.3.jar:/opt/apache-cassandra/lib/logback-core-1.1.3.jar:/opt/apache-cassandra/lib/lz4-1.3.0.jar:/opt/apache-cassandra/lib/metrics-core-3.1.0.jar:/opt/apache-cassandra/lib/metrics-jvm-3.1.0.jar:/opt/apache-cassandra/lib/metrics-logback-3.1.0.jar:/opt/apache-cassandra/lib/netty-all-4.0.44.Final.jar:/opt/apache-cassandra/lib/ohc-core-0.4.3.jar:/opt/apache-cassandra/lib/ohc-core-j8-0.4.3.jar:/opt/apache-cassandra/lib/reporter-config-base-3.0.0.jar:/opt/apache-cassandra/lib/reporter-config3-3.0.0.jar:/opt/apache-cassandra/lib/sigar-1.6.4.jar:/opt/apache-cassandra/lib/slf4j-api-1.7.7.jar:/opt/apache-cassandra/lib/snakeyaml-1.11.jar:/opt/apache-cassandra/lib/snappy-java-1.1.1.7.jar:/opt/apache-cassandra/lib/stream-2.5.2.jar:/opt/apache-cassandra/lib/thrift-server-0.3.7.jar:/opt/apache-cassandra/lib/jsr223/*/*.jar:/opt/apache-cassandra/lib/jamm-0.3.0.jar
INFO  [main] 2017-03-31 14:53:27,119 CassandraDaemon.java:449 - JVM Arguments: [-Dcassandra.commitlog.ignorereplayerrors=true, -Xloggc:/opt/apache-cassandra/logs/gc.log, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSWaitDuration=10000, -XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnabled, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -Xms953M, -Xmx953M, -Xmn800M, -ea, -Xss256k, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:StringTableSize=1000003, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:+PerfDisableSharedMem, -XX:CompileCommandFile=/opt/apache-cassandra/conf/hotspot_compiler, -javaagent:/opt/apache-cassandra/lib/jamm-0.3.0.jar, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -XX:+HeapDumpOnOutOfMemoryError, -Djava.net.preferIPv4Stack=true, -Dcassandra.jmx.local.port=7199, -XX:+DisableExplicitGC, -Djava.library.path=/opt/apache-cassandra/lib/sigar-bin, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/opt/apache-cassandra/logs, -Dcassandra.storagedir=/opt/apache-cassandra/data, -Dcassandra-foreground=yes]
WARN  [main] 2017-03-31 14:53:27,153 CLibrary.java:176 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN  [main] 2017-03-31 14:53:27,154 StartupChecks.java:121 - jemalloc shared library could not be preloaded to speed up memory allocations
WARN  [main] 2017-03-31 14:53:27,154 StartupChecks.java:153 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
WARN  [main] 2017-03-31 14:53:27,154 StartupChecks.java:178 - OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
INFO  [main] 2017-03-31 14:53:27,161 SigarLibrary.java:44 - Initializing SIGAR library
WARN  [main] 2017-03-31 14:53:27,171 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : false,  Address space adequate? : true,  nofile limit adequate? : true, nproc limit adequate? : true 
WARN  [main] 2017-03-31 14:53:27,175 StartupChecks.java:246 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.
WARN  [main] 2017-03-31 14:53:27,183 StartupChecks.java:267 - Directory /cassandra_data/data doesn't exist
ERROR [main] 2017-03-31 14:53:27,185 CassandraDaemon.java:710 - Has no permission to create directory /cassandra_data/data
[root@oc-master1 ~]# 
rahul334481 commented 7 years ago
[root@oc-master1 ~]# oc get rc -n openshift-infra
NAME                   DESIRED   CURRENT   READY     AGE
hawkular-cassandra-1   1         1         0         1d
hawkular-metrics       1         1         0         1d
heapster               1         1         0         1d
rahul334481 commented 7 years ago
[root@oc-master1 ~]# oc delete pod/hawkular-cassandra-1-4cns3 -n openshift-infra
pod "hawkular-cassandra-1-4cns3" deleted
[root@oc-master1 ~]# oc delete pod/heapster-3u9kd -n openshift-infra
pod "heapster-3u9kd" deleted
[root@oc-master1 ~]# oc delete pod/hawkular-metrics-3677c -n openshift-infra
pod "hawkular-metrics-3677c" deleted
[root@oc-master1 ~]# oc get pods -n openshift-infra
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-r2w3h   0/1       CrashLoopBackOff   2          1m
hawkular-metrics-4cy2n       0/1       Running            0          31s
heapster-5cha3               0/1       Running            0          46s
[root@oc-master1 ~]# 
rahul334481 commented 7 years ago
[root@oc-master1 ~]# oc logs hawkular-metrics-4cy2n -n openshift-infra
Certificate was added to keystore
[Storing hawkular-metrics.truststore]
Certificate was added to keystore
[Storing hawkular-metrics.truststore]
2017-03-31 15:04:49,259 INFO  [org.jboss.as.repository] (ServerService Thread Pool -- 9) WFLYDR0001: Content added at location /opt/jboss/wildfly/standalone/data/content/ac/88acbcbaa9a9b7cab9ab888db9fa801b63d8dd/content
2017-03-31 15:04:49,277 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0039: Creating http management service using socket-binding (management-http)
2017-03-31 15:04:49,293 INFO  [org.xnio] (MSC service thread 1-4) XNIO version 3.4.0.Final
2017-03-31 15:04:49,301 INFO  [org.xnio.nio] (MSC service thread 1-4) XNIO NIO Implementation Version 3.4.0.Final
2017-03-31 15:04:49,353 INFO  [org.jboss.remoting] (MSC service thread 1-4) JBoss Remoting version 4.0.21.Final
2017-03-31 15:04:49,362 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 45) WFLYCLJG0001: Activating JGroups subsystem.
2017-03-31 15:04:49,368 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 41) WFLYCLINF0001: Activating Infinispan subsystem.
2017-03-31 15:04:49,379 INFO  [org.wildfly.extension.io] (ServerService Thread Pool -- 40) WFLYIO001: Worker 'default' has auto-configured to 16 core threads with 128 task threads based on your 8 available processors
2017-03-31 15:04:49,379 INFO  [org.jboss.as.security] (ServerService Thread Pool -- 58) WFLYSEC0002: Activating Security Subsystem
2017-03-31 15:04:49,380 INFO  [org.jboss.as.naming] (ServerService Thread Pool -- 51) WFLYNAM0001: Activating Naming Subsystem
2017-03-31 15:04:49,389 INFO  [org.jboss.as.webservices] (ServerService Thread Pool -- 62) WFLYWS0002: Activating WebServices Extension
2017-03-31 15:04:49,387 INFO  [org.jboss.as.jsf] (ServerService Thread Pool -- 48) WFLYJSF0007: Activated the following JSF Implementations: [main]
2017-03-31 15:04:49,410 INFO  [org.jboss.as.connector] (MSC service thread 1-3) WFLYJCA0009: Starting JCA Subsystem (WildFly/IronJacamar 1.3.4.Final)
2017-03-31 15:04:49,430 INFO  [org.jboss.as.security] (MSC service thread 1-5) WFLYSEC0001: Current PicketBox version=4.9.6.Final
2017-03-31 15:04:49,435 INFO  [org.jboss.as.connector.subsystems.datasources] (ServerService Thread Pool -- 36) WFLYJCA0004: Deploying JDBC-compliant driver class org.h2.Driver (version 1.3)
2017-03-31 15:04:49,440 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-3) WFLYJCA0018: Started Driver service with driver-name = h2
2017-03-31 15:04:49,508 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0003: Undertow 1.4.0.Final starting
2017-03-31 15:04:49,535 INFO  [org.jboss.as.naming] (MSC service thread 1-5) WFLYNAM0003: Starting Naming Service
2017-03-31 15:04:49,535 INFO  [org.jboss.as.mail.extension] (MSC service thread 1-1) WFLYMAIL0001: Bound mail session [java:jboss/mail/Default]
2017-03-31 15:04:49,702 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool -- 61) WFLYUT0014: Creating file handler for path '/opt/jboss/wildfly/welcome-content' with options [directory-listing: 'false', follow-symlink: 'false', case-sensitive: 'true', safe-symlink-paths: '[]']
2017-03-31 15:04:49,704 INFO  [org.jboss.as.ejb3] (MSC service thread 1-8) WFLYEJB0481: Strict pool slsb-strict-max-pool is using a max instance size of 128 (per class), which is derived from thread worker pool sizing.
2017-03-31 15:04:49,704 INFO  [org.jboss.as.ejb3] (MSC service thread 1-2) WFLYEJB0482: Strict pool mdb-strict-max-pool is using a max instance size of 32 (per class), which is derived from the number of CPUs on this host.
2017-03-31 15:04:49,729 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0012: Started server default-server.
2017-03-31 15:04:49,730 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-6) WFLYUT0018: Host default-host starting
2017-03-31 15:04:49,806 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-5) WFLYUT0006: Undertow HTTP listener default listening on 0.0.0.0:8080
2017-03-31 15:04:49,812 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0006: Undertow AJP listener ajp listening on 0.0.0.0:8009
2017-03-31 15:04:49,815 INFO  [org.jboss.modcluster] (ServerService Thread Pool -- 64) MODCLUSTER000001: Initializing mod_cluster version 1.3.3.Final
2017-03-31 15:04:49,829 INFO  [org.jboss.modcluster] (ServerService Thread Pool -- 64) MODCLUSTER000032: Listening to proxy advertisements on /224.0.1.105:23364
2017-03-31 15:04:49,870 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-3) WFLYJCA0001: Bound data source [java:jboss/datasources/ExampleDS]
2017-03-31 15:04:50,006 INFO  [org.jboss.as.server.deployment.scanner] (MSC service thread 1-1) WFLYDS0013: Started FileSystemDeploymentService for directory /opt/jboss/wildfly/standalone/deployments
2017-03-31 15:04:50,012 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-6) WFLYSRV0027: Starting deployment of "hawkular-metrics.ear" (runtime-name: "hawkular-metrics.ear")
2017-03-31 15:04:50,116 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0006: Undertow HTTPS listener https listening on 0.0.0.0:8443
2017-03-31 15:04:50,301 INFO  [org.jboss.ws.common.management] (MSC service thread 1-6) JBWS022052: Starting JBossWS 5.1.5.Final (Apache CXF 3.1.6) 
2017-03-31 15:04:51,192 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0207: Starting subdeployment (runtime-name: "hawkular-alerts-action-email.war")
2017-03-31 15:04:51,192 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0207: Starting subdeployment (runtime-name: "hawkular-metrics.war")
2017-03-31 15:04:51,196 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0207: Starting subdeployment (runtime-name: "hawkular-alerts.war")
2017-03-31 15:04:51,196 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-1) WFLYSRV0207: Starting subdeployment (runtime-name: "hawkular-metrics-alerter.war")
2017-03-31 15:04:51,196 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-6) WFLYSRV0207: Starting subdeployment (runtime-name: "hawkular-alerts-action-webhook.war")
2017-03-31 15:05:19,387 ERROR [org.jgroups.protocols.TCP] (TransferQueueBundler,ee,hawkular-metrics-4cy2n) JGRP000029: hawkular-metrics-4cy2n: failed sending message to hawkular-metrics-3677c (55 bytes): java.net.SocketTimeoutException: connect timed out, headers: FD: heartbeat, TP: [cluster_name=ee]
2017-03-31 15:05:19,407 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-5,ee,hawkular-metrics-4cy2n) ISPN000094: Received new cluster view for channel server: [hawkular-metrics-4cy2n|2] (1) [hawkular-metrics-4cy2n]
2017-03-31 15:05:19,409 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-5,ee,hawkular-metrics-4cy2n) ISPN000094: Received new cluster view for channel hawkular-metrics: [hawkular-metrics-4cy2n|2] (1) [hawkular-metrics-4cy2n]
2017-03-31 15:05:19,409 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-5,ee,hawkular-metrics-4cy2n) ISPN000094: Received new cluster view for channel web: [hawkular-metrics-4cy2n|2] (1) [hawkular-metrics-4cy2n]
2017-03-31 15:05:19,411 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-5,ee,hawkular-metrics-4cy2n) ISPN000094: Received new cluster view for channel hawkular-alerts: [hawkular-metrics-4cy2n|2] (1) [hawkular-metrics-4cy2n]
2017-03-31 15:05:19,708 ERROR [org.jgroups.protocols.TCP] (TransferQueueBundler,ee,hawkular-metrics-4cy2n) JGRP000029: hawkular-metrics-4cy2n: failed sending message to hawkular-metrics-3677c (70 bytes): java.net.SocketTimeoutException: connect timed out, headers: VERIFY_SUSPECT: [VERIFY_SUSPECT: ARE_YOU_DEAD], TP: [cluster_name=ee]
2017-03-31 15:05:22,515 WARN  [org.hawkular.alerts.engine.impl.CassCluster] (ServerService Thread Pool -- 73) Could not connect to Cassandra cluster - assuming is not up yet. Cause: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.218.252:9042 (com.datastax.driver.core.exceptions.TransportException: [hawkular-cassandra/172.30.218.252:9042] Cannot connect))
2017-03-31 15:05:22,515 WARN  [org.hawkular.alerts.engine.impl.CassCluster] (ServerService Thread Pool -- 73) [11] Retrying connecting to Cassandra cluster in [3000]ms...
2017-03-31 15:07:53,306 WARN  [org.hawkular.alerts.engine.impl.CassCluster] (ServerService Thread Pool -- 65) Could not connect to Cassandra cluster - assuming is not up yet. Cause: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.218.252:9042 (com.datastax.driver.core.exceptions.TransportException: [hawkular-cassandra/172.30.218.252:9042] Cannot connect))
2017-03-31 15:07:53,306 WARN  [org.hawkular.alerts.engine.impl.CassCluster] (ServerService Thread Pool -- 65) [2] Retrying connecting to Cassandra cluster in [3000]ms...
[root@oc-master1 ~]# 
[root@oc-master1 ~]# oc logs heapster-5cha3 -n openshift-infra
Endpoint Check in effect. Checking https://hawkular-metrics:443/hawkular/metrics/status
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. Curl exit code: 7. Status Code 000
'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible [HTTP status code: 000. Curl exit code 7]. Retrying.
[root@oc-master1 ~]# 
rahul334481 commented 7 years ago
[root@oc-master1 ~]# oc get pv
NAME              CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                       REASON    AGE
logging-volume    500Gi      RWO           Retain          Bound     openshift-infra/metrics-1             1d
metrics-volume    100Gi      RWO           Retain          Bound     logging/logging-es-0                  1d
registry-volume   500Gi      RWX           Retain          Bound     default/registry-claim                1d
rahul334481 commented 7 years ago
TASK [openshift_master : Create the ldap ca file if needed] ********************
failed: [oc-master1.dur.lulu.com] (item={u'bindDN': u'<USER>', u'kind': u'LDAPPasswordIdentityProvider', u'name': u'lulu_ldap_provider', u'bindPassword': u'<PASS>', u'url': u'ldap://rhds.dur.lulu.com:389/ou=People,dc=lulu,dc=com?uid', u'insecure': u'false', u'ca': u'ca-bundle.crt', u'attributes': {u'id': [u'dn'], u'preferredUsername': [u'uid'], u'email': [u'mail'], u'name': [u'cn']}, u'login': u'true', u'challenge': u'true'}) => {"checksum": "491c3da8ace6eb400e2380f1f7bff88e9a62bdc0", "failed": true, "gid": 0, "group": "root", "item": {"attributes": {"email": ["mail"], "id": ["dn"], "name": ["cn"], "preferredUsername": ["uid"]}, "bindDN": "<USER>", "bindPassword": "<PASS>", "ca": "ca-bundle.crt", "challenge": "true", "insecure": "false", "kind": "LDAPPasswordIdentityProvider", "login": "true", "name": "lulu_ldap_provider", "url": "ldap://rhds.dur.lulu.com:389/ou=People,dc=lulu,dc=com?uid"}, "mode": "0600", "msg": "src file does not exist, use \"force=yes\" if you really want to create the link: /etc/origin/master/tmpzXSp8Q", "owner": "root", "path": "/etc/origin/master/ca-bundle.crt", "size": 263111, "src": "tmpzXSp8Q", "state": "hard", "uid": 0}
changed: [oc-master2.dur.lulu.com] => (item={u'bindDN': u'<USER>', u'kind': u'LDAPPasswordIdentityProvider', u'name': u'lulu_ldap_provider', u'bindPassword': u'<PASS>', u'url': u'ldap://rhds.dur.lulu.com:389/ou=People,dc=lulu,dc=com?uid', u'insecure': u'false', u'ca': u'ca-bundle.crt', u'attributes': {u'id': [u'dn'], u'preferredUsername': [u'uid'], u'email': [u'mail'], u'name': [u'cn']}, u'login': u'true', u'challenge': u'true'})
changed: [oc-master3.dur.lulu.com] => (item={u'bindDN': u'<USER>', u'kind': u'LDAPPasswordIdentityProvider', u'name': u'lulu_ldap_provider', u'bindPassword': u'<PASS>', u'url': u'ldap://rhds.dur.lulu.com:389/ou=People,dc=lulu,dc=com?uid', u'insecure': u'false', u'ca': u'ca-bundle.crt', u'attributes': {u'id': [u'dn'], u'preferredUsername': [u'uid'], u'email': [u'mail'], u'name': [u'cn']}, u'login': u'true', u'challenge': u'true'})

NO MORE HOSTS LEFT *************************************************************
    to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************
localhost                  : ok=10   changed=0    unreachable=0    failed=0   
oc-etcd1.dur.lulu.com      : ok=95   changed=5    unreachable=0    failed=0   
oc-etcd2.dur.lulu.com      : ok=87   changed=5    unreachable=0    failed=0   
oc-etcd3.dur.lulu.com      : ok=87   changed=5    unreachable=0    failed=0   
oc-master.dur.lulu.com     : ok=75   changed=4    unreachable=0    failed=0   
oc-master1.dur.lulu.com    : ok=211  changed=17   unreachable=0    failed=1   
oc-master2.dur.lulu.com    : ok=196  changed=19   unreachable=0    failed=0   
oc-master3.dur.lulu.com    : ok=196  changed=19   unreachable=0    failed=0   
oc-node1.dur.lulu.com      : ok=83   changed=6    unreachable=0    failed=0   
oc-node2.dur.lulu.com      : ok=83   changed=6    unreachable=0    failed=0 
abutcher commented 7 years ago

Closing since we were able to work around the original issue. Let's create new ones for specific problems if any are encountered. Thanks!

rahul334481 commented 7 years ago

Thanks @abutcher for all support. I will update you with configuring my host with LDAP as I tried multiple efforts still getting x509 Certificate Signed by unknown authority.

Thanks, Rahul

moortimis commented 7 years ago

Some of the best OpenShift SSL debugging information I've seen yet. Thanks for posting @abutcher, helped debug issues where internal certificates expired and the renew playbook failed.

bogdando commented 7 years ago

Very helpful indeed, @moortimis @abutcher

jsonpang commented 5 years ago

when i use system:admin to login ,the oc need me to input password ,i already check my ca things,then not works ,when i input oc status Unable to connect to the server: x509: certificate signed by unknown authority