openshift-metal3 / dev-scripts

Scripts to automate development/test setup for openshift integration with https://github.com/metal3-io/
Apache License 2.0
93 stars 185 forks source link

plugin.terraform-provider-ironic: 2019/03/27 18:19:30 [ERR] plugin: plugin server: accept unix /tmp/plugin204986381: use of closed network connection #234

Closed jhutar closed 5 years ago

jhutar commented 5 years ago

Installation failed:

$ make
[...]
ironic_node_v1.openshift-master-2: Still creating... (6m20s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (6m20s elapsed)
2019-03-27T18:19:29.662-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node current state is 'active'
2019-03-27T18:19:29.662-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node 8f0943d0-839c-45da-a548-c57e69501fdf is 'active', we are done.
ironic_node_v1.openshift-master-2: Creation complete after 6m25s (ID: 8f0943d0-839c-45da-a548-c57e69501fdf)
2019-03-27T18:19:29.878-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node current state is 'active'
2019-03-27T18:19:29.878-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node b8e484a7-bee9-49f3-a203-a3f06a474b74 is 'active', we are done.
ironic_node_v1.openshift-master-0: Creation complete after 6m25s (ID: b8e484a7-bee9-49f3-a203-a3f06a474b74)
2019/03/27 18:19:30 [DEBUG] plugin: waiting for all plugin processes to complete...

Error: Error applying plan:

1 error(s) occurred:

* ironic_node_v1.openshift-master-1: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

2019-03-27T18:19:30.101-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:30 [ERR] plugin: plugin server: accept unix /tmp/plugin204986381: use of closed network connection
2019-03-27T18:19:30.102-0400 [DEBUG] plugin: plugin process exited: path=/home/kni/.terraform.d/plugins/terraform-provider-ironic
make: *** [ocp_run] Error 1

Some random commands as I basically do not know what I'm doing :-)

[kni@intel-canoepass-09 dev-scripts]$ oc --config /home/kni/dev-scripts/ocp/auth/kubeconfig get nodes
error: the server doesn't have a resource type "nodes"
[kni@intel-canoepass-09 dev-scripts]$ export OS_TOKEN=fake-token
[kni@intel-canoepass-09 dev-scripts]$ export OS_URL=http://localhost:6385/
[kni@intel-canoepass-09 dev-scripts]$ openstack baremetal node list
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name               | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| c9c0a601-86a7-41ad-bf84-96c429b93fd9 | openshift-master-1 | None          | power on    | active             | False       |
| b8e484a7-bee9-49f3-a203-a3f06a474b74 | openshift-master-0 | None          | power on    | active             | False       |
| 8f0943d0-839c-45da-a548-c57e69501fdf | openshift-master-2 | None          | power on    | active             | False       |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
[kni@intel-canoepass-09 dev-scripts]$ openstack baremetal node show c9c0a601-86a7-41ad-bf84-96c429b93fd9
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                  | Value                                                                                                                                                                                                                                                                           |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| allocation_uuid        | None                                                                                                                                                                                                                                                                            |
| automated_clean        | None                                                                                                                                                                                                                                                                            |
| bios_interface         | no-bios                                                                                                                                                                                                                                                                         |
| boot_interface         | ipxe                                                                                                                                                                                                                                                                            |
| chassis_uuid           | None                                                                                                                                                                                                                                                                            |
| clean_step             | {}                                                                                                                                                                                                                                                                              |
| conductor              | intel-canoepass-09.khw1.lab.eng.bos.redhat.com                                                                                                                                                                                                                                  |
| conductor_group        |                                                                                                                                                                                                                                                                                 |
| console_enabled        | False                                                                                                                                                                                                                                                                           |
| console_interface      | no-console                                                                                                                                                                                                                                                                      |
| created_at             | 2019-03-27T22:13:04.839047+00:00                                                                                                                                                                                                                                                |
| deploy_interface       | direct                                                                                                                                                                                                                                                                          |
| deploy_step            | {}                                                                                                                                                                                                                                                                              |
| description            | None                                                                                                                                                                                                                                                                            |
| driver                 | ipmi                                                                                                                                                                                                                                                                            |
| driver_info            | {u'ipmi_port': u'6231', u'ipmi_username': u'admin', u'deploy_kernel': u'http://172.22.0.1/images/ironic-python-agent.kernel', u'ipmi_address': u'192.168.111.1', u'deploy_ramdisk': u'http://172.22.0.1/images/ironic-python-agent.initramfs', u'ipmi_password': u'******'}     |
| driver_internal_info   | {u'deploy_boot_mode': u'bios', u'is_whole_disk_image': True, u'root_uuid_or_disk_id': u'0xbaef8236', u'agent_url': u'http://172.22.0.78:9999', u'deploy_steps': None, u'agent_version': u'3.7.0.dev3', u'agent_last_heartbeat': u'2019-03-27T22:18:08.408711'}                  |
| extra                  | {}                                                                                                                                                                                                                                                                              |
| fault                  | None                                                                                                                                                                                                                                                                            |
| inspect_interface      | inspector                                                                                                                                                                                                                                                                       |
| inspection_finished_at | None                                                                                                                                                                                                                                                                            |
| inspection_started_at  | None                                                                                                                                                                                                                                                                            |
| instance_info          | {u'root_gb': u'25', u'image_source': u'http://172.22.0.1/images/redhat-coreos-maipo-latest.qcow2', u'image_type': u'whole-disk-image', u'root_device': u'/dev/vda', u'image_checksum': u'308f00a5cb04c5aaf0f15073dabe335f', u'image_url': u'******', u'configdrive': u'******'} |
| instance_uuid          | None                                                                                                                                                                                                                                                                            |
| last_error             | None                                                                                                                                                                                                                                                                            |
| maintenance            | False                                                                                                                                                                                                                                                                           |
| maintenance_reason     | None                                                                                                                                                                                                                                                                            |
| management_interface   | ipmitool                                                                                                                                                                                                                                                                        |
| name                   | openshift-master-1                                                                                                                                                                                                                                                              |
| network_interface      | noop                                                                                                                                                                                                                                                                            |
| owner                  | None                                                                                                                                                                                                                                                                            |
| power_interface        | ipmitool                                                                                                                                                                                                                                                                        |
| power_state            | power on                                                                                                                                                                                                                                                                        |
| properties             | {}                                                                                                                                                                                                                                                                              |
| protected              | False                                                                                                                                                                                                                                                                           |
| protected_reason       | None                                                                                                                                                                                                                                                                            |
| provision_state        | active                                                                                                                                                                                                                                                                          |
| provision_updated_at   | 2019-03-27T22:19:20.972264+00:00                                                                                                                                                                                                                                                |
| raid_config            | {}                                                                                                                                                                                                                                                                              |
| raid_interface         | no-raid                                                                                                                                                                                                                                                                         |
| rescue_interface       | no-rescue                                                                                                                                                                                                                                                                       |
| reservation            | None                                                                                                                                                                                                                                                                            |
| resource_class         | None                                                                                                                                                                                                                                                                            |
| storage_interface      | noop                                                                                                                                                                                                                                                                            |
| target_power_state     | None                                                                                                                                                                                                                                                                            |
| target_provision_state | None                                                                                                                                                                                                                                                                            |
| target_raid_config     | {}                                                                                                                                                                                                                                                                              |
| traits                 | []                                                                                                                                                                                                                                                                              |
| updated_at             | 2019-03-27T22:19:21.011055+00:00                                                                                                                                                                                                                                                |
| uuid                   | c9c0a601-86a7-41ad-bf84-96c429b93fd9                                                                                                                                                                                                                                            |
| vendor_interface       | ipmitool                                                                                                                                                                                                                                                                        |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[kni@intel-canoepass-09 dev-scripts]$ openstack baremetal node show c9c0a601-86a7-41ad-bf84-96c429b93fd9 -f value -c last_error
None
djgalloway commented 5 years ago

Same here.

2019/03/28 19:03:25 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
2019/03/28 19:03:25 [TRACE] [walkApply] Exiting eval tree: ironic_node_v1.openshift-master-1
ironic_node_v1.openshift-master-0: Still creating... (2m30s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (2m40s elapsed)
2019-03-28T19:03:48.580Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:03:48 [DEBUG] Node current state is 'deploying'
2019-03-28T19:03:48.580Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:03:48 [DEBUG] Node 8eb912b3-a9de-407e-ac76-d05e283e065e is 'deploying', waiting for Ironic to finish.
ironic_node_v1.openshift-master-0: Still creating... (2m50s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m0s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m10s elapsed)
2019-03-28T19:04:18.604Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:18 [DEBUG] Node current state is 'deploying'
2019-03-28T19:04:18.604Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:18 [DEBUG] Node 8eb912b3-a9de-407e-ac76-d05e283e065e is 'deploying', waiting for Ironic to finish.
ironic_node_v1.openshift-master-0: Still creating... (3m20s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m30s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (3m40s elapsed)
2019-03-28T19:04:48.632Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:48 [DEBUG] Node current state is 'active'
2019-03-28T19:04:48.632Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:48 [DEBUG] Node 8eb912b3-a9de-407e-ac76-d05e283e065e is 'active', we are done.
ironic_node_v1.openshift-master-0: Creation complete after 3m49s (ID: 8eb912b3-a9de-407e-ac76-d05e283e065e)

2019/03/28 19:04:48 [DEBUG] plugin: waiting for all plugin processes to complete...
Error: Error applying plan:

2 error(s) occurred:

* ironic_node_v1.openshift-master-2: 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: Internal Server Error
* ironic_node_v1.openshift-master-1: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
2019-03-28T19:04:48.743Z [DEBUG] plugin.terraform-provider-ironic: 2019/03/28 19:04:48 [ERR] plugin: plugin server: accept unix /tmp/plugin271256051: use of closed network connection
above and apply again to incrementally change your infrastructure.

2019-03-28T19:04:48.744Z [DEBUG] plugin: plugin process exited: path=/home/dgalloway/.terraform.d/plugins/terraform-provider-ironic
make: *** [ocp_run] Error 1

[...]

[dgalloway@smithi153 dev-scripts]$ sudo virsh list --all
 Id    Name                           State
----------------------------------------------------
 3     ostest-w2hk6-bootstrap         running
 7     openshift_master_0             running
 8     openshift_master_2             running
 9     openshift_master_1             running
 -     openshift_worker_0             shut off
 -     openshift_worker_1             shut off
 -     openshift_worker_2             shut off
jhutar commented 5 years ago

Retried and looking at state of the machines one of the master VMs did not booted:

Screenshot_openshift_master_1_2019-03-28_20:55:03

When rebooted, it boots fine.

jhutar commented 5 years ago

I have deleted the VMs and first tried to run ./07_deploy_masters.sh, then I have ran directly cd /home/kni/dev-scripts/ocp/tf-master; terraform apply -auto-approve.

Noted this Terraform failure:

Error: Error applying plan:

3 error(s) occurred:

* ironic_node_v1.openshift-master-0: 1 error(s) occurred:

* ironic_node_v1.openshift-master-0: Expected HTTP response code [] when accessing [POST http://localhost:6385/v1/nodes], but got 409 instead
{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"A node with name openshift-master-0 already exists.\"}"}
* ironic_node_v1.openshift-master-1: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Expected HTTP response code [] when accessing [POST http://localhost:6385/v1/nodes], but got 409 instead
{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"A node with name openshift-master-1 already exists.\"}"}
* ironic_node_v1.openshift-master-2: 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: Expected HTTP response code [] when accessing [POST http://localhost:6385/v1/nodes], but got 409 instead
{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"A node with name openshift-master-2 already exists.\"}"}

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
jhutar commented 5 years ago

And on a host machine, there is this in the journal:

Mar 28 16:19:47 <hostname> sshd[53899]: Address 192.168.111.1 maps to <hostname>, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
Mar 28 16:19:47 <hostname> sshd[53899]: Accepted publickey for root from 192.168.111.1 port 39232 ssh2: RSA SHA256:OvKACMSqIKEZn/gPl+jaFHiVEufD+kURHa5N/Xzi6mo
Mar 28 16:19:47 <hostname> systemd-logind[6040]: New session 1874 of user root.
Mar 28 16:19:47 <hostname> systemd[1]: Started Session 1874 of user root.
Mar 28 16:19:47 <hostname> sshd[53899]: pam_unix(sshd:session): session opened for user root by (uid=0)
Mar 28 16:19:47 <hostname> vbmcd[42576]: libvirt: QEMU Driver error : Domain not found: no domain with matching name 'openshift_master_1'
Mar 28 16:19:47 <hostname> sshd[53899]: Received disconnect from 192.168.111.1 port 39232:11: disconnected by user
Mar 28 16:19:47 <hostname> sshd[53899]: Disconnected from 192.168.111.1 port 39232
Mar 28 16:19:47 <hostname> sshd[53899]: pam_unix(sshd:session): session closed for user root
Mar 28 16:19:47 <hostname> vbmcd[42576]: Traceback (most recent call last):
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/bmc.py", line 175, in handle_raw_request
Mar 28 16:19:47 <hostname> vbmcd[42576]: return self.get_chassis_status(session)
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/pyghmi/ipmi/bmc.py", line 91, in get_chassis_status
Mar 28 16:19:47 <hostname> vbmcd[42576]: powerstate = self.get_power_state()
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/virtualbmc/vbmc.py", line 123, in get_power_state
Mar 28 16:19:47 <hostname> vbmcd[42576]: domain = utils.get_libvirt_domain(conn, self.domain_name)
Mar 28 16:19:47 <hostname> vbmcd[42576]: File "/usr/lib/python2.7/site-packages/virtualbmc/utils.py", line 62, in get_libvirt_domain
Mar 28 16:19:47 <hostname> vbmcd[42576]: raise exception.DomainNotFound(domain=domain)
Mar 28 16:19:47 <hostname> vbmcd[42576]: DomainNotFound: No domain with matching name openshift_master_1 was found
Mar 28 16:19:47 <hostname> systemd-logind[6040]: Removed session 1874.
jhutar commented 5 years ago

Hmm, could it be caused by insufficient disk space? I had only 50GB in / (rest was hidden in /home). Retrying with changed partitioning.

djgalloway commented 5 years ago

Hmm, could it be caused by insufficient disk space?

Definitely not the case in my scenario. I have 800GB of free space.

jhutar commented 5 years ago

I have not seen this in last two attempts to install (where I have made sure to remove /home partition and extend / - but that might not be connected).

hardys commented 5 years ago

I think this was probably fixed via https://github.com/openshift-metalkube/dev-scripts/pull/237 but it's hard to be sure without more information, @jhutar since you report it's now working for you I'll close this, and we can work through any remaining issues via some follow-up issues, thanks!