openshift / installer

Install an OpenShift 4.x cluster
https://try.openshift.com
Apache License 2.0
1.42k stars 1.38k forks source link

libvirt: storage volume 'coreos_base' already exists #642

Closed jianzhangbjz closed 5 years ago

jianzhangbjz commented 5 years ago

Version

[jzhang@dhcp-140-18 ~]$ openshift-install version
openshift-install v0.3.0-161-ge64a43d293f594c1d51a317822aa6b4783295ecb
Terraform v0.11.8

Your version of Terraform is out of date! The latest version
is 0.11.10. You can update by downloading from www.terraform.io/downloads.html

Platform (aws|libvirt|openstack):

libvirt

What happened?

Got the below errors when building the OCP 4.0.

libvirt_network.tectonic_net: Creating...
  addresses.#:            "" => "1"
  addresses.0:            "" => "192.168.126.0/24"
  autostart:              "" => "true"
  bridge:                 "" => "tt0"
  dns.#:                  "" => "1"
  dns.0.hosts.#:          "" => "4"
  dns.0.hosts.0.hostname: "" => "demo3-api"
  dns.0.hosts.0.ip:       "" => "192.168.126.10"
  dns.0.hosts.1.hostname: "" => "demo3-api"
  dns.0.hosts.1.ip:       "" => "192.168.126.11"
  dns.0.hosts.2.hostname: "" => "demo3-etcd-0"
  dns.0.hosts.2.ip:       "" => "192.168.126.11"
  dns.0.hosts.3.hostname: "" => "demo3"
  dns.0.hosts.3.ip:       "" => "192.168.126.50"
  dns.0.local_only:       "" => "true"
  dns.0.srvs.#:           "" => "1"
  dns.0.srvs.0.domain:    "" => "demo3.tt.testing"
  dns.0.srvs.0.port:      "" => "2380"
  dns.0.srvs.0.protocol:  "" => "tcp"
  dns.0.srvs.0.service:   "" => "etcd-server-ssl"
  dns.0.srvs.0.target:    "" => "demo3-etcd-0.tt.testing"
  dns.0.srvs.0.weight:    "" => "10"
  domain:                 "" => "tt.testing"
  mode:                   "" => "nat"
  name:                   "" => "demo3"

Error: Error applying plan:

5 error(s) occurred:

* module.libvirt_base_volume.libvirt_volume.coreos_base: 1 error(s) occurred:

* libvirt_volume.coreos_base: storage volume 'coreos_base' already exists
* module.bootstrap.libvirt_ignition.bootstrap: 1 error(s) occurred:

* libvirt_ignition.bootstrap: Error creating libvirt volume for Ignition bootstrap.ign: virError(Code=90, Domain=18, Message='storage volume 'bootstrap.ign' exists already')
* libvirt_ignition.worker: 1 error(s) occurred:

* libvirt_ignition.worker: Error creating libvirt volume for Ignition worker.ign: virError(Code=90, Domain=18, Message='storage volume 'worker.ign' exists already')
* libvirt_ignition.master: 1 error(s) occurred:

* libvirt_ignition.master: Error creating libvirt volume for Ignition master.ign: virError(Code=90, Domain=18, Message='storage volume 'master.ign' exists already')
* libvirt_network.tectonic_net: 1 error(s) occurred:

* libvirt_network.tectonic_net: Error defining libvirt network: virError(Code=1, Domain=19, Message='internal error: bridge name 'tt0' already in use.') -   <network>
      <name>demo3</name>
      <forward mode="nat"></forward>
      <bridge name="tt0" stp="on"></bridge>
      <domain name="tt.testing" localOnly="yes"></domain>
      <dns>
          <host ip="192.168.126.11">
              <hostname>demo3-api</hostname>
              <hostname>demo3-etcd-0</hostname>
          </host>
          <host ip="192.168.126.50">
              <hostname>demo3</hostname>
          </host>
          <host ip="192.168.126.10">
              <hostname>demo3-api</hostname>
          </host>
          <srv service="etcd-server-ssl" protocol="tcp" target="demo3-etcd-0.tt.testing" port="2380" weight="10" domain="demo3.tt.testing"></srv>
      </dns>
      <ip address="192.168.126.1" family="ipv4" prefix="24"></ip>
  </network>

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

FATAL Error executing openshift-install: failed to fetch Cluster: failed to generate asset "Cluster": failed to run terraform: failed to execute Terraform: exit status 1 

What you expected to happen?

Create the OCP 4.0 cluster successfully.

How to reproduce it (as minimally and precisely as possible)?

1, Create the OCP 4.0 failed, so destroy it as below:

[jzhang@dhcp-140-18 ~]$ openshift-install destroy cluster --dir=1108 --log-level=debug

2, Rebuild with another dir, like below:

[jzhang@dhcp-140-18 installer]$ openshift-install create cluster --dir 09 --log-level=debug|tee ./09/install.log

Anything else we need to know?

[jzhang@dhcp-140-18 ~]$ uname -a
Linux dhcp-140-18.nay.redhat.com 4.18.16-200.fc28.x86_64 #1 SMP Sat Oct 20 23:53:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[jzhang@dhcp-140-18 ~]$ cat /etc/redhat-release 
Fedora release 28 (Twenty Eight)
[jzhang@dhcp-140-18 ~]$ cat /proc/cpuinfo |grep vmx
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d

[jzhang@dhcp-140-18 ~]$ rpm -qa| grep libvirt
libvirt-4.1.0-5.fc28.x86_64

References

jianzhangbjz commented 5 years ago

I think I found the root cause, like below. So I start and delete them.

[jzhang@dhcp-140-18 installer]$ sudo virsh list --all
setlocale: No such file or directory
 Id    Name                           State
----------------------------------------------------
 -     bootstrap                      shut off
 -     master0                        shut off
[jzhang@dhcp-140-18 installer]$ sudo virsh start bootstrap
[jzhang@dhcp-140-18 installer]$ sudo virsh start master0

[jzhang@dhcp-140-18 installer]$ openshift-install destroy cluster --dir 09 --log-level=debug
DEBUG Deleting libvirt network                     
DEBUG Deleting libvirt domains                     
DEBUG Deleting libvirt volumes                     
INFO Deleted volume                                volume=master.ign
INFO Deleted volume                                volume=coreos_base
INFO Deleted volume                                volume=master0
INFO Deleted domain                                domain=master0
INFO Deleted network                               network=demo
DEBUG Exiting deleting libvirt network             
DEBUG goroutine deleteNetwork complete             
INFO Deleted volume                                volume=worker.ign
INFO Deleted volume                                volume=bootstrap.ign
INFO Deleted domain                                domain=bootstrap
DEBUG Exiting deleting libvirt domains             
DEBUG goroutine deleteDomains complete             
INFO Deleted volume                                volume=bootstrap
DEBUG Exiting deleting libvirt volumes             
DEBUG goroutine deleteVolumes complete             
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin" from disk   

[jzhang@dhcp-140-18 installer]$ sudo virsh list --all
setlocale: No such file or directory
 Id    Name                           State
----------------------------------------------------

And, I reset the cluster name to demo4, and rebuild the OCP 4.0 cluster.

[jzhang@dhcp-140-18 installer]$ env | grep -i cluster_name
OPENSHIFT_INSTALL_CLUSTER_NAME=demo4

But, I got below errors. I don't understand why it still connects to demo3. Seems like the OPENSHIFT_INSTALL_CLUSTER_NAME=demo4 did not work. How can I solve it?

module.bootstrap.libvirt_domain.bootstrap: Creating...
  arch:                             "" => "<computed>"
  console.#:                        "" => "1"
  console.0.target_port:            "" => "0"
  console.0.type:                   "" => "pty"
  coreos_ignition:                  "" => "/var/lib/libvirt/images/bootstrap.ign;5be44d97-6836-10e1-c0f7-a00d51877a43"
  disk.#:                           "" => "1"
  disk.0.scsi:                      "" => "false"
  disk.0.volume_id:                 "" => "/var/lib/libvirt/images/bootstrap"
  emulator:                         "" => "<computed>"
  machine:                          "" => "<computed>"
  memory:                           "" => "2048"
  name:                             "" => "bootstrap"
  network_interface.#:              "" => "1"
  network_interface.0.addresses.#:  "" => "1"
  network_interface.0.addresses.0:  "" => "192.168.126.10"
  network_interface.0.hostname:     "" => "demo3-bootstrap"
  network_interface.0.mac:          "" => "<computed>"
  network_interface.0.network_id:   "" => "2040b681-cc05-4bf6-8f23-a564c922e43e"
  network_interface.0.network_name: "" => "<computed>"
  qemu_agent:                       "" => "false"
  running:                          "" => "true"
  vcpu:                             "" => "2"
INFO Waiting for bootstrap completion...          
DEBUG API not up yet: Get https://demo3-api.tt.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: no route to host
crawford commented 5 years ago

I don't understand why it still connects to demo3. Seems like the OPENSHIFT_INSTALL_CLUSTER_NAME=demo4 did not work. How can I solve it?

If you are using the same directory as the previous invocation, the installer will use all of the state from the previous run. You need to destroy the cluster before you create a new one or move to a different directory.

jianzhangbjz commented 5 years ago

@crawford Thanks! But, as I mentioned in above, I already destroy it before I rebuild it.

wking commented 5 years ago

But, as I mentioned in above, I already destroy it before I rebuild it.

If you have leftovers in libvirt from a previous run, you need to clean up those as well. If you don't have anything in libvirt that you need to keep, you can use scripts/maintenance/virsh-cleanup.sh for that. Otherwise you'll need to cleanup cluster domains, volumes, and networks on your own. Do you still see this issue with a clean libvirt environment and a fresh installer --dir?

jianzhangbjz commented 5 years ago

No, thanks! It works well with a fresh directory. Close it.