Closed thorion3006 closed 5 years ago
if i run the same script with: INSTALL_EXTENSIONS=0 bash install.sh
i get the following error:
Logs:
failed: [master0.argt.ae] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/usr/bin/oc get pod master-etcd-master0.argt.ae -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The connection to the server openshift-master.argt.ae:8443 was refused - did you specify the right host or port?\n", "stdout": ""}}
Hi @thorion3006
Did the setup_dns.yaml ran without problems? for all-in-one the setup dns should put static records in the /etc/hosts of the designated master - do you have those in /etc/hosts in your master?
Can you please share your journalctl --since=-1h
from for your failing master?
Hi @thorion3006 Did the setup_dns.yaml ran without problems? for all-in-one the setup dns should put static records in the /etc/hosts of the designated master - do you have those in /etc/hosts in your master? Can you please share your
journalctl --since=-1h
from for your failing master?
As for the hosts file:
This is the output for ip addr
:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 56:6f:e9:a2:00:46 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.95/23 brd 10.10.11.255 scope global noprefixroute dynamic eth0
valid_lft 5330sec preferred_lft 5330sec
inet 10.10.10.94/23 brd 10.10.11.255 scope global secondary noprefixroute dynamic eth0
valid_lft 5008sec preferred_lft 5008sec
inet6 fe80::546f:e9ff:fea2:46/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:3b:1a:d9:52 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
And this is the entry in /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.94 master0.argt.ae etcd.argt.ae openshift-master.argt.ae openshift-public-master.argt.ae docker-registry-default.apps.argt.ae webconsole.openshift-web-console.svc registry-console-default.apps.argt.ae
So the problem is, its adding the wrong ip address in the hosts file
Also if i try this config:
openshift_ovirt_vm_manifest:
- name: 'master'
count: 2
profile: 'master_vm'
- name: 'node'
count: 2
profile: 'node_vm'
- name: 'etcd'
count: 1
profile: 'node_vm'
- name: 'lb'
count: 1
profile: 'node_vm'
openshift_ovirt_all_in_one: false
openshift_ovirt_cluster: Default
openshift_ovirt_data_store: hosted_storage
openshift_ovirt_ssh_key: "{{ lookup('file', 'id_rsa.pub') }}"
I get the following error:
Failure summary:
1. Hosts: master1.argt.ae
Play: Configure masters
Task: create node config template
Message: Destination directory /etc/origin/node does not exist
This is the setup i'm looking to deploy, when i finally get this script working :-)
If this is not going to be all-in-one then you have to setup your DNS + DHCP separately - ovirt doesn't DNS services capabilities. If you want you can fix the mac addresses of your created VMs - and care for DHCP records for them or to statically set everything - see this:
openshift_ovirt_vm_manifest:
#######################################
# Multiple Node Static Ip addresses
#######################################
- name: 'master'
count: 3
profile: 'master'
nic_mode:
# This must fit the same name as this kind of vms. (e.g) if the name is test, this must be test0
master0:
nic_ip_address: '192.168.123.160'
nic_netmask: '255.255.255.0'
nic_gateway: '192.168.123.1'
nic_on_boot: True
nic_name: 'eth0'
dns_servers: "192.168.1.100"
master1:
nic_ip_address: '192.168.123.161'
nic_netmask: '255.255.255.0'
nic_gateway: '192.168.123.1'
nic_on_boot: True
nic_name: 'nic0'
dns_servers: "192.168.1.100"
If this is not going to be all-in-one then you have to setup your DNS + DHCP separately - ovirt doesn't DNS services capabilities. If you want you can fix the mac addresses of your created VMs - and care for DHCP records for them or to statically set everything - see this:
openshift_ovirt_vm_manifest: ####################################### # Multiple Node Static Ip addresses ####################################### - name: 'master' count: 3 profile: 'master' nic_mode: # This must fit the same name as this kind of vms. (e.g) if the name is test, this must be test0 master0: nic_ip_address: '192.168.123.160' nic_netmask: '255.255.255.0' nic_gateway: '192.168.123.1' nic_on_boot: True nic_name: 'eth0' dns_servers: "192.168.1.100" master1: nic_ip_address: '192.168.123.161' nic_netmask: '255.255.255.0' nic_gateway: '192.168.123.1' nic_on_boot: True nic_name: 'nic0' dns_servers: "192.168.1.100"
can you show me how to fix the mac address and hostname in the manifest? I would like to assign a fixed IP from my DHCP server instead of a static IP.
Also, I tried it with a static IP and it also gives the same error Destination directory /etc/origin/node does not exist
Log:
TASK [openshift_node_group : create node config template] **********************
Wednesday 20 March 2019 15:59:55 +0000 (0:00:00.491) 4:23:01.314 *******
fatal: [master1.argt.ae]: FAILED! => {"changed": false, "checksum": "71c149a63f34cbc41ce36eab3427e9d6383840b0", "msg": "Destination directory /etc/origin/node does not exist"}
The below pull request [1] will help, but I'm not sure when it will be in. What we can do to hack around this, is to create a small playbook that will set the mac addresses of your vms:
# set_macs.yaml playbook
- hosts: localhost
connection: local
gather_facts: no
vars:
vms:
- name: your-vm0
nics:
- name: nic1
mac_address: 56:6f:21:a5:00:0d
profile: {}
- name: your-vm1
nics:
- name: nic1
mac_address: 56:6f:21:a5:00:0e
profile: {}
tasks:
- import_role:
name: oVirt.vm-infra
About the directory not exists - which vm template did you use?
[1] https://github.com/openshift/openshift-ansible/pull/11442
i used the centos 7 template given in the vars.yaml file
- I updated the comment with the playbook.
- can log open the console of one of the vms to check if /etc/origin exists?
You can run the playbook directly on your ovirt-engine machine. I think your VMs are already up, so all you need is reset their MAC addresses, and then reboot them, or reboot networking, so they'll get the hostnames set up. Then you can continue with your installation.
Regarding the missing directory, I'm able to reproduce it now, I'll update you
Maybe I messed the vars.yaml - the compute section says name: node
while it should be name: compute
:
Can you try?
openshift_ovirt_vm_manifest:
- name: master
count: 3
profile: master_vm
- name: compute
# here ^^^ was the mistake
...
Maybe I messed the vars.yaml - the compute section says
name: node
while it should bename: compute
: Can you try?openshift_ovirt_vm_manifest: - name: master count: 3 profile: master_vm - name: compute # here ^^^ was the mistake ...
I still get the same error:
TASK [openshift_node_group : create node config template] **********************
Tuesday 02 April 2019 16:08:46 +0000 (0:00:00.285) 0:18:39.344 *********
fatal: [master1.argt.ae]: FAILED! => {"changed": false, "checksum": "71c149a63f34cbc41ce36eab3427e9d6383840b0", "msg": "Destination directory /etc/origin/node does not exist"}
When i ssh into master1, /etc/origin
has only master
file, node is missing.
Also if i do a fresh install of all the VMs, when should i invoke the set_mac.yaml playbook?
Finally how to add hostnames to my VMs, they're all set to localhost
. I tried setting it like:
openshift_ovirt_vm_manifest:
- name: master
count: 2
profile: master_vm
hostname:
master0:
name: 'master0'
master1:
name: 'master1'
but it has no effect.
OK now I see the bug - I don't generate the inventory correctly when installing multi nodes. I reported it and sent a fix [1] - What I'll do is to apply the patch to the ovirt-openshift-container. This issue will be used to help with that. I'll consider to apply the mac_address patch as well.
[1] https://github.com/openshift/openshift-ansible/issues/11457
hey, any idea on when this will be merged?
It has been fixed already. I closed the openshift-ansible issue. Now I need to make sure you get that as well by rebuilding the ovirt-openshift-installer container , and after that you need pull that container locally again
@thorion3006 did you pull the latest image yet?
@thorion3006 did you pull the latest image yet?
hey sorry for the late reply, my server motherboard died, so i'm waiting for the replacement... i'll let you know once i get my server back up and running.
I am getting the same error with the latest version:
Failure summary:
1. Hosts: master1.zamot.io
Play: Configure masters
Task: create node config template
Message: Destination directory /etc/origin/node does not exist
This only happens when I try to deploy a multi node + infra or compute. Also, sometimes when I try to redeploy on existing VMs, it fails with:
PLAY [Fail openshift_kubelet_name_override for new hosts] ****************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************
Friday 23 August 2019 15:44:03 +0000 (0:00:00.115) 0:01:20.833 *********
fatal: [infra1.zamot.io]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"172.17.0.1\". Make sure this host can be reached over ssh", "unreachable": true}
It seems it's trying to use the docker ip when generating the inventory.
Description I followed the wiki to install okd, it fails in TASK [login as system]
Steps To Reproduce
Expected behavior It should deploy 1 all-in-one okd pod
Logs: TASK [login as system] ***** fatal: [master0.argt.ae]: FAILED! => {"changed": true, "cmd": "oc login -u system:admin", "delta": "0:00:00.300472", "end": "2019-03-16 10:55:30.174786", "msg": "non-zero return code", "rc": 1, "start": "2019-03-16 10:55:29.874314", "stderr": "error: dial tcp 10.10.10.90:8443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running.", "stderr_lines": ["error: dial tcp 10.10.10.90:8443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running."], "stdout": "", "stdout_lines": []}
PLAY RECAP ***** localhost : ok=30 changed=1 unreachable=0 failed=0 master0.argt.ae : ok=3 changed=2 unreachable=0 failed=1