KubeInit External access for OpenShift/OKD deployments with Libvirt

git4liluo commented 4 years ago

I firstly followed https://www.anstack.com/blog/2020/07/31/the-fastest-and-simplest-way-to-deploy-okd-openshift-4-5.html to install one cluster on KVM using KubeInit successfully, then I followed https://www.anstack.com/blog/2020/10/04/Multiple-KubeInit-clusters-in-the-same-hypervisor.html to install two clusters on KVM using KubeInit successfully. With the latter instruction, I see two related lines when using nmcli con show: kimgtbr0 633f6586-ca9f-4488-9d88-5e8abc81844f bridge kimgtbr0 kimgtbr2 36ed494d-a0df-4aaa-a983-a848cdfab7ec bridge kimgtbr2) Do I need to customize your instruction in this article when I want to not only install such multiple clusters, but make them accessible from external? My current problem is I followed the instruction in this article and using okd_multiclusters repo, I got below error: TASK [../../roles/kubeinit_libvirt : check if bridge is created] *** [DEPRECATION WARNING]: evaluating 'kubeinit_libvirt_external_service_interface_enabled' as a bare variable, this behaviour will go away and you might need to add |bool to the expression in the future. Also see CONDITIONAL_BARE_VARS configuration toggle. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. fatal: [hypervisor-01]: FAILED! => {"changed": true, "cmd": "nmcli con show | grep kiextbr0\n", "delta": "0:00:00.078851", "end": "2020-10-05 17:35:52.483528", "msg": "non-zero return code", "rc": 1, "start": "2020-10-05 17:35:52.404677", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ...ignoring

TASK [../../roles/kubeinit_libvirt : Fail if bridge is not created but included] *** fatal: [hypervisor-01]: FAILED! => {"changed": false, "msg": "The bridge kiextbr0 to provide external\nconnectivity is enabled but not created. This is a requirement that needs to be\ncreated before running the playbook.\n"}

I didn't use cockpit to create bridge,because I can't get the user name and password to log in the server. I just used cli commands in the instruction, where I didn't provide kiextbr0.

Would you please advice? Thanks very much for your support so far. Very useful tool!

ccamacho commented 4 years ago

You enable it by: systemctl enable --now cockpit.socket to access it, use the root user with its password

The error means you did not create the bridge with the name defined in the [inventory[(https://github.com/Kubeinit/kubeinit/blob/master/kubeinit/hosts/okd/inventory#L15)

It's a validation that when you run nmcli con show the bridge must be there.

@git4liluo if you could it would be awesome if you can star https://github.com/kubeinit/kubeinit to catch up with updates and new features.

git4liluo commented 4 years ago

Thanks, got it. Step 2: The variable is kubeinit_libvirt_external_service_interface_enabled in the code, not the kubeinit_bind_external_service_interface_enabled in the screenshot. I will use the former one in both the main.yml file and in the command line in Step 3. BTW, why we provide the same information twice, one is in the main.yml file, the other is in the command line in Step 3? Also, how to decide kubeinit_libvirt_external_service_interface.dev and kubeinit_libvirt_external_service_interface.ip? You used eth1 and 10.19.41.157 as their values. How can I decide the values? I am new to such networking knowledge, appreciate your patience. The series of posts are really useful to me. Cheers.

ccamacho commented 4 years ago

Hi @git4liluo

Thanks for pointing that out, I already fixed the post with the new variable names and a correct screenshot. In step 2, what you see in the defaults file are the values that need to be adjusted to your environment. In step 3, we set up those values depending on our environment. eth0 is the interface in the vm so you should not change it, you only need to adjust ip/gateway/mask

Bilal-io commented 4 years ago

I am having a weird issue here. First time I ran the playbook it worked fine, but after that it fails at "install services requirements" task.

Error:

Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FIPS  28 May 2019
  debug1: Reading configuration data /etc/ssh/ssh_config
  debug3: /etc/ssh/ssh_config line 51: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0
  debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
  debug2: checking match for 'final all' host 10.0.0.100 originally 10.0.0.100
  debug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'
  debug2: match not found
  debug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)
  debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
  debug3: gss kex names ok: [gss-gex-sha1-,gss-group14-sha1-]
  debug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]
  debug1: configuration requests final Match pass
  debug2: resolve_canonicalize: hostname 10.0.0.100 is address
  debug1: re-parsing configuration
  debug1: Reading configuration data /etc/ssh/ssh_config
  debug3: /etc/ssh/ssh_config line 51: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0
  debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
  debug2: checking match for 'final all' host 10.0.0.100 originally 10.0.0.100
  debug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'
  debug2: match found
  debug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1
  debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
  debug3: gss kex names ok: [gss-gex-sha1-,gss-group14-sha1-]
  debug3: kex names ok: [curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]
  debug1: auto-mux: Trying existing master
  debug1: Control socket \"/root/.ansible/cp/dafc7f009a\" does not exist
  debug1: Executing proxy command: exec ssh -W 10.0.0.100:22 root@homeserver
  debug3: timeout: 120000 ms remain after connect
  debug1: identity file /root/.ssh/id_rsa type 0
  debug1: identity file /root/.ssh/id_rsa-cert type -1
  debug1: identity file /root/.ssh/id_dsa type -1
  debug1: identity file /root/.ssh/id_dsa-cert type -1
  debug1: identity file /root/.ssh/id_ecdsa type -1
  debug1: identity file /root/.ssh/id_ecdsa-cert type -1
  debug1: identity file /root/.ssh/id_ed25519 type -1
  debug1: identity file /root/.ssh/id_ed25519-cert type -1
  debug1: identity file /root/.ssh/id_xmss type -1
  debug1: identity file /root/.ssh/id_xmss-cert type -1
  debug1: Local version string SSH-2.0-OpenSSH_8.0
  Connection timed out during banner exchange

ccamacho commented 4 years ago

hi @Bilal-io, looks like you did some changes to the inventory, using the defaults, can you confirm you can ssh to root@nyctea without a password from the machine you are running the playbook from? you might have a nit somewhere. Also, can you share the changes you did to the inventory, I ask because I added more variables and parameters so if you are not using the latest, it might break.

If you could it would be awesome if you can star https://github.com/kubeinit/kubeinit to catch up with updates and new features.

Bilal-io commented 4 years ago

Hey @ccamacho good point, I tested when I was doing the initial setup, and it worked. After your comment, I tested again and it didn't work. I fixed it by adding the hostname to /etc/hosts

Starred!

tlhconsulting commented 3 years ago

First of all thank you for such a simple and easy OpenShift Deployment mechanism, truly a terrific job. Without going into too much detail I have a Openshift Cluster that I instantiated, updated to 4.6 and recently had a power outage that took down my server. The server is back up and running and what I am wondering is there a way to just bring up the current Cluster without re-deployment and then updating again. I will of course continue to look at the ansible code, but if you have a shortcut or an existing method please share.

tlhconsulting commented 3 years ago

So it was just easier to update the repo for the updated openshift-cluster and openshift-install and rebuild from scratch. But if anyone comes up with any shutdown, startup procedures please provide.

ccamacho commented 3 years ago

Hi @tlhconsulting sorry for the delayed answer, can you explain your specific issue when shutting down the cluster? I assume that the hypervisor was turned off, and you turned in on, but the VMs/cluster didn't start correctly?

Can you create an issue in https://github.com/Kubeinit/kubeinit/issues explaining the actual behavior you had, and how it should be?

If you could it would be awesome if you can star https://github.com/kubeinit/kubeinit to catch up with updates and new features.

pubstack / pubstack.github.io

KubeInit External access for OpenShift/OKD deployments with Libvirt #72