oVirt / ovirt-openshift-extensions

Implementation of flexvolume driver and provisioner for oVirt
Apache License 2.0
31 stars 16 forks source link

automation CI to install okd fails #123

Closed thorion3006 closed 5 years ago

thorion3006 commented 5 years ago

Description I followed the wiki to install okd, it fails in TASK [login as system]

Steps To Reproduce

  1. curl -O "https://raw.githubusercontent.com/oVirt/ovirt-openshift-extensions/master/automation/ci/{install.sh,vars.yaml}"
  2. edit vars.yaml to add ovirt engine details
  3. bash install.sh

Expected behavior It should deploy 1 all-in-one okd pod

Logs: TASK [login as system] ***** fatal: [master0.argt.ae]: FAILED! => {"changed": true, "cmd": "oc login -u system:admin", "delta": "0:00:00.300472", "end": "2019-03-16 10:55:30.174786", "msg": "non-zero return code", "rc": 1, "start": "2019-03-16 10:55:29.874314", "stderr": "error: dial tcp 10.10.10.90:8443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running.", "stderr_lines": ["error: dial tcp 10.10.10.90:8443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running."], "stdout": "", "stdout_lines": []}

PLAY RECAP ***** localhost : ok=30 changed=1 unreachable=0 failed=0 master0.argt.ae : ok=3 changed=2 unreachable=0 failed=1

thorion3006 commented 5 years ago

if i run the same script with: INSTALL_EXTENSIONS=0 bash install.sh i get the following error: Logs: failed: [master0.argt.ae] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/usr/bin/oc get pod master-etcd-master0.argt.ae -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The connection to the server openshift-master.argt.ae:8443 was refused - did you specify the right host or port?\n", "stdout": ""}}

rgolangh commented 5 years ago

Hi @thorion3006 Did the setup_dns.yaml ran without problems? for all-in-one the setup dns should put static records in the /etc/hosts of the designated master - do you have those in /etc/hosts in your master? Can you please share your journalctl --since=-1h from for your failing master?

thorion3006 commented 5 years ago

Hi @thorion3006 Did the setup_dns.yaml ran without problems? for all-in-one the setup dns should put static records in the /etc/hosts of the designated master - do you have those in /etc/hosts in your master? Can you please share your journalctl --since=-1h from for your failing master?

journalctl_master.txt

As for the hosts file: This is the output for ip addr: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 56:6f:e9:a2:00:46 brd ff:ff:ff:ff:ff:ff inet 10.10.10.95/23 brd 10.10.11.255 scope global noprefixroute dynamic eth0 valid_lft 5330sec preferred_lft 5330sec inet 10.10.10.94/23 brd 10.10.11.255 scope global secondary noprefixroute dynamic eth0 valid_lft 5008sec preferred_lft 5008sec inet6 fe80::546f:e9ff:fea2:46/64 scope link valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:3b:1a:d9:52 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever

And this is the entry in /etc/hosts:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

10.10.10.94 master0.argt.ae etcd.argt.ae openshift-master.argt.ae openshift-public-master.argt.ae docker-registry-default.apps.argt.ae webconsole.openshift-web-console.svc registry-console-default.apps.argt.ae

So the problem is, its adding the wrong ip address in the hosts file

thorion3006 commented 5 years ago

Also if i try this config:

openshift_ovirt_vm_manifest:
  - name: 'master'
    count: 2
    profile: 'master_vm'
  - name: 'node'
    count: 2
    profile: 'node_vm'
  - name: 'etcd'
    count: 1
    profile: 'node_vm'
  - name: 'lb'
    count: 1
    profile: 'node_vm'

openshift_ovirt_all_in_one: false
openshift_ovirt_cluster: Default
openshift_ovirt_data_store: hosted_storage
openshift_ovirt_ssh_key: "{{ lookup('file', 'id_rsa.pub') }}"

I get the following error:

Failure summary:

  1. Hosts:    master1.argt.ae
     Play:     Configure masters
     Task:     create node config template
     Message:  Destination directory /etc/origin/node does not exist

This is the setup i'm looking to deploy, when i finally get this script working :-)

rgolangh commented 5 years ago

If this is not going to be all-in-one then you have to setup your DNS + DHCP separately - ovirt doesn't DNS services capabilities. If you want you can fix the mac addresses of your created VMs - and care for DHCP records for them or to statically set everything - see this:

openshift_ovirt_vm_manifest:
#######################################
# Multiple Node Static Ip addresses
#######################################
- name: 'master'
  count: 3
  profile: 'master'
  nic_mode:
      # This must fit the same name as this kind of vms. (e.g) if the name is test, this must be test0
      master0:
        nic_ip_address: '192.168.123.160'
        nic_netmask: '255.255.255.0'
        nic_gateway: '192.168.123.1'
        nic_on_boot: True
        nic_name: 'eth0'
        dns_servers: "192.168.1.100"
      master1:
        nic_ip_address: '192.168.123.161'
        nic_netmask: '255.255.255.0'
        nic_gateway: '192.168.123.1'
        nic_on_boot: True
        nic_name: 'nic0'
        dns_servers: "192.168.1.100"
thorion3006 commented 5 years ago

If this is not going to be all-in-one then you have to setup your DNS + DHCP separately - ovirt doesn't DNS services capabilities. If you want you can fix the mac addresses of your created VMs - and care for DHCP records for them or to statically set everything - see this:

openshift_ovirt_vm_manifest:
#######################################
# Multiple Node Static Ip addresses
#######################################
- name: 'master'
  count: 3
  profile: 'master'
  nic_mode:
      # This must fit the same name as this kind of vms. (e.g) if the name is test, this must be test0
      master0:
        nic_ip_address: '192.168.123.160'
        nic_netmask: '255.255.255.0'
        nic_gateway: '192.168.123.1'
        nic_on_boot: True
        nic_name: 'eth0'
        dns_servers: "192.168.1.100"
      master1:
        nic_ip_address: '192.168.123.161'
        nic_netmask: '255.255.255.0'
        nic_gateway: '192.168.123.1'
        nic_on_boot: True
        nic_name: 'nic0'
        dns_servers: "192.168.1.100"

can you show me how to fix the mac address and hostname in the manifest? I would like to assign a fixed IP from my DHCP server instead of a static IP.

Also, I tried it with a static IP and it also gives the same error Destination directory /etc/origin/node does not exist

Log:

TASK [openshift_node_group : create node config template] **********************
Wednesday 20 March 2019  15:59:55 +0000 (0:00:00.491)       4:23:01.314 *******
fatal: [master1.argt.ae]: FAILED! => {"changed": false, "checksum": "71c149a63f34cbc41ce36eab3427e9d6383840b0", "msg": "Destination directory /etc/origin/node does not exist"}
rgolangh commented 5 years ago

The below pull request [1] will help, but I'm not sure when it will be in. What we can do to hack around this, is to create a small playbook that will set the mac addresses of your vms:

# set_macs.yaml playbook
- hosts: localhost
  connection: local
  gather_facts: no
  vars:
    vms:
      - name: your-vm0
        nics:
          - name: nic1
            mac_address: 56:6f:21:a5:00:0d

        profile: {}
      - name: your-vm1
        nics:
          - name: nic1
            mac_address: 56:6f:21:a5:00:0e
        profile: {}
  tasks:
    - import_role:
        name: oVirt.vm-infra

About the directory not exists - which vm template did you use?

[1] https://github.com/openshift/openshift-ansible/pull/11442

thorion3006 commented 5 years ago

i used the centos 7 template given in the vars.yaml file

rgolangh commented 5 years ago
thorion3006 commented 5 years ago
  • I updated the comment with the playbook.
  • can log open the console of one of the vms to check if /etc/origin exists?
  1. where should this playbook be initialized from in the vars.yaml?
  2. Can you clarify this please? if you're asking me to ssh into a vm and to locate the folder, i already did that. It's not present
rgolangh commented 5 years ago

You can run the playbook directly on your ovirt-engine machine. I think your VMs are already up, so all you need is reset their MAC addresses, and then reboot them, or reboot networking, so they'll get the hostnames set up. Then you can continue with your installation.

Regarding the missing directory, I'm able to reproduce it now, I'll update you

rgolangh commented 5 years ago

Maybe I messed the vars.yaml - the compute section says name: node while it should be name: compute: Can you try?

openshift_ovirt_vm_manifest:
  - name: master
    count: 3
    profile: master_vm
  - name: compute
  # here   ^^^ was the mistake
   ...
thorion3006 commented 5 years ago

Maybe I messed the vars.yaml - the compute section says name: node while it should be name: compute: Can you try?

openshift_ovirt_vm_manifest:
  - name: master
    count: 3
    profile: master_vm
  - name: compute
  # here   ^^^ was the mistake
   ...

I still get the same error:

TASK [openshift_node_group : create node config template] **********************
Tuesday 02 April 2019  16:08:46 +0000 (0:00:00.285)       0:18:39.344 *********
fatal: [master1.argt.ae]: FAILED! => {"changed": false, "checksum": "71c149a63f34cbc41ce36eab3427e9d6383840b0", "msg": "Destination directory /etc/origin/node does not exist"}

When i ssh into master1, /etc/origin has only master file, node is missing.

Also if i do a fresh install of all the VMs, when should i invoke the set_mac.yaml playbook?

Finally how to add hostnames to my VMs, they're all set to localhost. I tried setting it like:

openshift_ovirt_vm_manifest:
  - name: master
    count: 2
    profile: master_vm
    hostname:
        master0:
            name: 'master0'
        master1:
            name: 'master1'

but it has no effect.

rgolangh commented 5 years ago

OK now I see the bug - I don't generate the inventory correctly when installing multi nodes. I reported it and sent a fix [1] - What I'll do is to apply the patch to the ovirt-openshift-container. This issue will be used to help with that. I'll consider to apply the mac_address patch as well.

[1] https://github.com/openshift/openshift-ansible/issues/11457

thorion3006 commented 5 years ago

hey, any idea on when this will be merged?

rgolangh commented 5 years ago

It has been fixed already. I closed the openshift-ansible issue. Now I need to make sure you get that as well by rebuilding the ovirt-openshift-installer container , and after that you need pull that container locally again

rgolangh commented 5 years ago

@thorion3006 did you pull the latest image yet?

thorion3006 commented 5 years ago

@thorion3006 did you pull the latest image yet?

hey sorry for the late reply, my server motherboard died, so i'm waiting for the replacement... i'll let you know once i get my server back up and running.

mzamot commented 5 years ago

I am getting the same error with the latest version:

Failure summary:

  1. Hosts:    master1.zamot.io
     Play:     Configure masters
     Task:     create node config template
     Message:  Destination directory /etc/origin/node does not exist

This only happens when I try to deploy a multi node + infra or compute. Also, sometimes when I try to redeploy on existing VMs, it fails with:

PLAY [Fail openshift_kubelet_name_override for new hosts] ****************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************
Friday 23 August 2019  15:44:03 +0000 (0:00:00.115)       0:01:20.833 ********* 
fatal: [infra1.zamot.io]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"172.17.0.1\". Make sure this host can be reached over ssh", "unreachable": true}

It seems it's trying to use the docker ip when generating the inventory.