Closed jhutar closed 5 years ago
Same here.
2019/03/28 19:03:25 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:
* ironic_node_v1.openshift-master-1: Internal Server Error
2019/03/28 19:03:25 [TRACE] [walkApply] Exiting eval tree: ironic_node_v1.openshift-master-1
ironic_node_v1.openshift-master-0: Still creating... (2m30s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (2m40s elapsed)
2019-03-28T19:03:48.580Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:03:48 [DEBUG] Node current state is 'deploying'
2019-03-28T19:03:48.580Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:03:48 [DEBUG] Node 8eb912b3-a9de-407e-ac76-d05e283e065e is 'deploying', waiting for Ironic to finish.
ironic_node_v1.openshift-master-0: Still creating... (2m50s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m0s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m10s elapsed)
2019-03-28T19:04:18.604Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:18 [DEBUG] Node current state is 'deploying'
2019-03-28T19:04:18.604Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:18 [DEBUG] Node 8eb912b3-a9de-407e-ac76-d05e283e065e is 'deploying', waiting for Ironic to finish.
ironic_node_v1.openshift-master-0: Still creating... (3m20s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m30s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m40s elapsed)
2019-03-28T19:04:48.632Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:48 [DEBUG] Node current state is 'active'
2019-03-28T19:04:48.632Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:48 [DEBUG] Node 8eb912b3-a9de-407e-ac76-d05e283e065e is 'active', we are done.
ironic_node_v1.openshift-master-0: Creation complete after 3m49s (ID: 8eb912b3-a9de-407e-ac76-d05e283e065e)
2019/03/28 19:04:48 [DEBUG] plugin: waiting for all plugin processes to complete...
Error: Error applying plan:
2 error(s) occurred:
* ironic_node_v1.openshift-master-2: 1 error(s) occurred:
* ironic_node_v1.openshift-master-2: Internal Server Error
* ironic_node_v1.openshift-master-1: 1 error(s) occurred:
* ironic_node_v1.openshift-master-1: Internal Server Error
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
2019-03-28T19:04:48.743Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:48 [ERR] plugin: plugin server: accept unix /tmp/plugin271256051: use of closed network connection
above and apply again to incrementally change your infrastructure.
2019-03-28T19:04:48.744Z [DEBUG] plugin: plugin process exited: path=/home/dgalloway/.terraform.d/plugins/terraform-provider-ironic
make: *** [ocp_run] Error 1
[...]
[dgalloway@smithi153 dev-scripts]$ sudo virsh list --all
Id Name State
----------------------------------------------------
3 ostest-w2hk6-bootstrap running
7 openshift_master_0 running
8 openshift_master_2 running
9 openshift_master_1 running
- openshift_worker_0 shut off
- openshift_worker_1 shut off
- openshift_worker_2 shut off
Retried and looking at state of the machines one of the master VMs did not booted:
When rebooted, it boots fine.
I have deleted the VMs and first tried to run ./07_deploy_masters.sh
, then I have ran directly cd /home/kni/dev-scripts/ocp/tf-master; terraform apply -auto-approve
.
Noted this Terraform failure:
Error: Error applying plan:
3 error(s) occurred:
* ironic_node_v1.openshift-master-0: 1 error(s) occurred:
* ironic_node_v1.openshift-master-0: Expected HTTP response code [] when accessing [POST http://localhost:6385/v1/nodes], but got 409 instead
{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"A node with name openshift-master-0 already exists.\"}"}
* ironic_node_v1.openshift-master-1: 1 error(s) occurred:
* ironic_node_v1.openshift-master-1: Expected HTTP response code [] when accessing [POST http://localhost:6385/v1/nodes], but got 409 instead
{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"A node with name openshift-master-1 already exists.\"}"}
* ironic_node_v1.openshift-master-2: 1 error(s) occurred:
* ironic_node_v1.openshift-master-2: Expected HTTP response code [] when accessing [POST http://localhost:6385/v1/nodes], but got 409 instead
{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"A node with name openshift-master-2 already exists.\"}"}
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
And on a host machine, there is this in the journal:
Mar 28 16:19:47 <hostname> sshd[53899]: Address 192.168.111.1 maps to <hostname>, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
Mar 28 16:19:47 <hostname> sshd[53899]: Accepted publickey for root from 192.168.111.1 port 39232 ssh2: RSA SHA256:OvKACMSqIKEZn/gPl+jaFHiVEufD+kURHa5N/Xzi6mo
Mar 28 16:19:47 <hostname> systemd-logind[6040]: New session 1874 of user root.
Mar 28 16:19:47 <hostname> systemd[1]: Started Session 1874 of user root.
Mar 28 16:19:47 <hostname> sshd[53899]: pam_unix(sshd:session): session opened for user root by (uid=0)
Mar 28 16:19:47 <hostname> vbmcd[42576]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'openshift_master_1'
Mar 28 16:19:47 <hostname> sshd[53899]: Received disconnect from 192.168.111.1 port 39232:11: disconnected by user
Mar 28 16:19:47 <hostname> sshd[53899]: Disconnected from 192.168.111.1 port 39232
Mar 28 16:19:47 <hostname> sshd[53899]: pam_unix(sshd:session): session closed for user root
Mar 28 16:19:47 <hostname> vbmcd[42576]: Traceback (most recent call last):
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/bmc.py", line 175, in handle_raw_request
Mar 28 16:19:47 <hostname> vbmcd[42576]: return self.get_chassis_status(session)
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/bmc.py", line 91, in get_chassis_status
Mar 28 16:19:47 <hostname> vbmcd[42576]: powerstate = self.get_power_state()
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/virtualbmc/vbmc.py", line 123, in get_power_state
Mar 28 16:19:47 <hostname> vbmcd[42576]: domain = utils.get_libvirt_domain(conn, self.domain_name)
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/virtualbmc/utils.py", line 62, in get_libvirt_domain
Mar 28 16:19:47 <hostname> vbmcd[42576]: raise exception.DomainNotFound(domain=domain)
Mar 28 16:19:47 <hostname> vbmcd[42576]: DomainNotFound: No domain with matching name openshift_master_1 was found
Mar 28 16:19:47 <hostname> systemd-logind[6040]: Removed session 1874.
Hmm, could it be caused by insufficient disk space? I had only 50GB in /
(rest was hidden in /home
). Retrying with changed partitioning.
Hmm, could it be caused by insufficient disk space?
Definitely not the case in my scenario. I have 800GB of free space.
I have not seen this in last two attempts to install (where I have made sure to remove /home
partition and extend /
- but that might not be connected).
I think this was probably fixed via https://github.com/openshift-metalkube/dev-scripts/pull/237 but it's hard to be sure without more information, @jhutar since you report it's now working for you I'll close this, and we can work through any remaining issues via some follow-up issues, thanks!
Installation failed:
Some random commands as I basically do not know what I'm doing :-)