projectatomic / atomic-host-tests

A collection of single-host tests for Atomic Host
GNU General Public License v3.0
18 stars 21 forks source link

openshift-ansible-test fails for fedora 25 atomic host for openshift origin 3.6 #209

Open samvarankashyap opened 7 years ago

samvarankashyap commented 7 years ago

Aim: Trying to test openshift origin 3.6 release using master branch of atomic-host-tests

Environment : Fedora 25 atomic host on libvirt image_src: https://ci.centos.org/artifacts/fedora-atomic/f25/images/fedora-atomic-25.84-fe4aabcd9a1e012.qcow2

Command used:

ansible-playbook -vvvv -i /tmp/linchpintest/inventories/libvirt.inventory tests/openshift-ansible-test/main.yml

Inventory used:

[example]
192.168.124.44 hostname=192.168.124.44 ansible_ssh_user=admin ansible_ssh_private_key_file=/root/.ssh/ex ansible_become=true

[all]
192.168.124.44 hostname=192.168.124.44 ansible_ssh_user=admin ansible_ssh_private_key_file=/root/.ssh/ex ansible_become=true

Note: I have made minor changes to cluster inventory such as version and image tag, private key

output :

FAILED - RETRYING: HANDLER: openshift_master : Verify API Server (2 retries left).
FAILED - RETRYING: HANDLER: openshift_master : Verify API Server (1 retries left).
fatal: [192.168.124.44]: FAILED! => {
    "attempts": 120, 
    "changed": false, 
    "cmd": [
        "curl", 
        "--silent", 
        "--tlsv1.2", 
        "--cacert", 
        "/etc/origin/master/ca-bundle.crt", 
        "https://192.168.124.44:8443/healthz/ready"
    ], 
    "delta": "0:00:01.392229", 
    "end": "2017-07-19 20:17:21.601212", 
    "failed": true, 
    "rc": 0, 
    "start": "2017-07-19 20:17:20.208983", 
    "warnings": []
}

STDOUT:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "User \"system:anonymous\" cannot \"get\" on \"/healthz/ready\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

PLAY RECAP *********************************************************************
192.168.124.44             : ok=401  changed=70   unreachable=0    failed=1   
localhost                  : ok=11   changed=0    unreachable=0    failed=0   

Failure summary:

  1. Host:     192.168.124.44
     Play:     Configure masters
     Task:     openshift_master : Verify API Server
     Message:  ???
---

STDERR:
---
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

This feature will be removed in version 2.4. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.
---
        to retry, use: --limit @/home/srallaba/workspace/venvs/libvirt/ansos/atomic-host-tests/tests/openshift-ansible-test/main.retry

Any help on why its failing is highly appreciated. Thanks

miabbott commented 7 years ago

@samvarankashyap TBH, I haven't run this test using OpenShift Origin 3.6 on Fedora 25, so you may be finding a bug in Origin itself.

I'll try to run the Origin installer on its own in the near future to see if I can replicate your issue outside of the tests.

samvarankashyap commented 7 years ago

@miabbott quick question , when you tried openshift installation on libvirt machines , have you configured DNS of the machines provisioned . coz am almost sure the above problem is due to DNS mismatch. If so could you please let me know how we configure DNS . Thanks.

miabbott commented 7 years ago

@samvarankashyap I didn't do anything special with DNS configuration. Just whatever the machines were given via DHCP.

For example, a libvirt VM booted on my workstation has a resolv.conf that looks like this:

$ cat /etc/resolv.conf 
# Generated by NetworkManager
search atomichost.localdomain
nameserver 192.168.122.1

I just tried running the openshift-ansible installer against a Fedora 26 AH VM and now I'm hitting:

https://github.com/openshift/openshift-ansible/issues/4801

So I'm kind of blocked on debugging this further.

However, I'll note I was able to run the openshift-ansible installer against a RHELAH 7.4 libvirt VM on my laptop and I successfully got Origin 1.5 installed there. (Note, this was the installer itself and not the test from this repo)

samvarankashyap commented 7 years ago

@miabbott Could you please share me the inventory file which you have used for the installation on rehelah 7.4

miabbott commented 7 years ago

I think this is close to what I used:

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_user=cloud-user
ansible_become=true
deployment_type=origin
containerized=true
openshift_release=v1.5.1
openshift_master_default_subdomain=192.168.122.115.xip.io
openshift_router_selector='router=true'
openshift_registry_selector='registry=true'
openshift_hostname=192.168.122.115.xip.io

# enable htpasswd auth
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$zgSjCrLt$1KSuj66CggeWSv.D.BXOA1', 'user': '$apr1$.gw8w9i1$ln9bfTRiD6OwuNTG5LvW50'}

# host group for masters
[masters]
192.168.122.115

# host group for etcd, should run on a node that is not schedulable
[etcd]
192.168.122.115

# host group for worker nodes, we list master node here so that
# openshift-sdn gets installed. We mark the master node as not
# schedulable.
[nodes]
192.168.122.115   openshift_schedulable=true openshift_node_labels="{'router':'true','registry':'true'}"

I was using the the openshift-ansible-3.5.101-1 branch of openshift-ansible. Worth noting is that I did have to add in the openshift_hostname variable to the inventory file when I did this, but when I previously tested this a few months ago, I did not.

samvarankashyap commented 7 years ago

@miabbott : I ran openshift-ansible master branch with following inventory on fedora atomic 25 and it works openshift-ansible commit sha : f709c76f727dd8166851d01d205fe2159449c854

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_user=fedora
ansible_become=true

openshift_master_default_subdomain=192.168.122.157.xip.io

deployment_type=origin
openshift_deployment_type=origin
openshift_release=v1.5.1
#deployment_subtype=registry
containerized=true

# enable htpasswd auth
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$zgSjCrLt$1KSuj66CggeWSv.D.BXOA1'}

# host group for masters
[masters]
192.168.122.157

# host group for worker nodes, we list master node here so that
# openshift-sdn gets installed. We mark the master node as not
# schedulable.
[nodes]
192.168.122.157 openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true

will be running latest commit and get you back on it.

samvarankashyap commented 7 years ago

Ran successfully using openshift-ansible

Just installed openshift 3.6 with latest openshift-ansible repo on local libvirt instance running fedoraatomic25

ansible --version
ansible 2.2.2.0
openshift-ansible commit sha : 5b0525f509906a43071f824b02ceb8d170c300ed

byo inventory :

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_user=fedora
ansible_become=true

openshift_master_default_subdomain=192.168.122.211.xip.io

deployment_type=origin
openshift_deployment_type=origin
openshift_release=v3.6
openshift_image_tag=v3.6.0-rc.0
#deployment_subtype=registry
containerized=true

# enable htpasswd auth
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$zgSjCrLt$1KSuj66CggeWSv.D.BXOA1'}

# host group for masters
[masters]
192.168.122.211

# host group for worker nodes, we list master node here so that
# openshift-sdn gets installed. We mark the master node as not
# schedulable.
[nodes]
192.168.122.211 openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true
miabbott commented 7 years ago

Thanks for the extra info @samvarankashyap. I'm glad that you were able to get the cluster running on F25.

This has pointed out that our test doesn't currently support versions of Origin that are not blessed as stable. For example, deploying the released version of Origin 1.5.1 only requires the use of the openshift_release variable. But if we want to deploy Origin 3.6 (which is currently tagged as 'pre-release'), we also have to supply the openshift_image_tag value because the container images haven't been tagged with the v3.6 tag yet.

I'd like to keep the test simple and not have to support this additional requirement when trying to deploy a pre-release version of Origin. The original intent of the test was to verify that stable versions of openshift-ansible installer worked on the Atomic Host streams we track (some of which are more stable than others). My opinion that the 'stable version' requirement should also extend to the version of Origin that the test tries to deploy.

Does that make sense to you? Do you agree or disagree?