okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.71k stars 294 forks source link

Installation on vSphere - ERROR Error: rpc error: code = Unavailable desc = transport is closing #309

Closed zombiefish closed 3 years ago

zombiefish commented 4 years ago

Describe the bug Installation fails with 'Unavailable description'

Version openshift-install 4.5.0-0.okd-2020-08-12-020541 built from commit 699277bb61706731d687b9e40700ebf4630b0851 release image quay.io/openshift/okd@sha256:6974c414be62aee4fde24fe47ccfff97c2854ddc37eb196f3f3bcda2fdec17b4

How reproducible 100% reporducible

Command Line $ openshift-install create cluster --dir=/export/Projects/OKD/VMware --log-level=info ? SSH Public Key /home/XXX/.ssh/id_rsa.pub ? Platform vsphere ? vCenter vcenter.XXX.com ? Username XXX@XXX.com ? Password [? for help] **** INFO Connecting to vCenter vcenter.XXX.com INFO Defaulting to only available datacenter: XXX
? Cluster Containers ? Default Datastore Lenovo1_ISCSI_RAID5_16TB ? Network OKD (10.10.22%2f24) ? Virtual IP Address for API 10.10.22.100 ? Virtual IP Address for Ingress 10.10.22.101 ? Base Domain XXX.com ? Cluster Name okd ? Pull Secret [? for help] *****INFO Obtaining RHCOS image file from 'https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/32.20200629.3.0/x86_64/fedora-coreos-32.20200629.3.0-vmware.x86_64.ova?sha256=172f299a3e28be360740ff437a5ea9bfc246f52ea8f313d4138c5d16fd4b11e1' INFO The file was found in cache: /home/XXX /.cache/openshift-installer/image_cache/062bfe3785d26fa220e2e6e72d1b3562. Reusing... INFO Creating infrastructure resources...
ERROR
ERROR Error: rpc error: code = Unavailable desc = transport is closing ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

openshift_install.log openshift_install_state.json.txt

vrutkovs commented 4 years ago

Last log lines:

time="2020-08-20T11:23:00-04:00" level=debug msg="vsphere_tag_category.category: Creating..."
time="2020-08-20T11:23:00-04:00" level=error
time="2020-08-20T11:23:00-04:00" level=error msg="Error: could not create category: 400 Bad Request: {\"type\":\"com.vmware.vapi.std.errors.already_exists\",\"value\":{\"messages\":[]}}"
time="2020-08-20T11:23:00-04:00" level=error
time="2020-08-20T11:23:00-04:00" level=error msg="  on ../../../../../tmp/openshift-install-732522033/main.tf line 54, in resource \"vsphere_tag_category\" \"category\":"
time="2020-08-20T11:23:00-04:00" level=error msg="  54: resource \"vsphere_tag_category\" \"category\" {"
time="2020-08-20T11:23:00-04:00" level=error
time="2020-08-20T11:23:00-04:00" level=error
time="2020-08-20T11:23:00-04:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"

It appears the tag category already exists?

yukselao commented 3 years ago

any solution for this problem?

yukselao commented 3 years ago

I got similar error message. I am trying to install openshift 4.5 on vsphere platform.

TownGeekAus commented 3 years ago

Same here INFO Creating infrastructure resources...
ERROR
ERROR Error: rpc error: code = Unavailable desc = transport is closing ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

TownGeekAus commented 3 years ago

openshift 4.6 DEBUG vsphere_tag.tag: Creating...
DEBUG vsphere_tag.tag: Creation complete after 0s [id=urn:vmomi:InventoryServiceTag:32f172ad-d760-4c1d-822d-79c054140f24:GLOBAL] DEBUG vsphereprivate_import_ova.import: Creating... ERROR
ERROR Error: rpc error: code = Unavailable desc = transport is closing ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

jlacasse commented 3 years ago

Same thing for me:

DEBUG vsphere_folder.folder[0]: Creation complete after 0s [id=group-v308] DEBUG vsphereprivate_import_ova.import: Creating... ERROR ERROR Error: rpc error: code = Unavailable desc = transport is closing ERROR ERROR FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

killergoalie commented 3 years ago

Are any of you using ESXi 6.0? I'm seeing the same error with vcenter 6.7 and esxi hosts 6.0 (can't upgrade hosts due to hardware limitation)

AndrewProwse commented 3 years ago

Unfortunately I never got this to work using the included tools. I was on ESXi 6.0. I ended up writing my own scripts, and use my own haproxy, tftpd server (to automate the grub command line needs for bootstrap, manager and worker), and web server (hold ignition files)to get this mess automated.

leewx95 commented 3 years ago

For my scenario, I first run the script without any issue until i hit network error when POST-ing the vmdk into the ESXi host inside my vCenter cluster. As ESXi host connection was not part of the test, and was not informed in the OKD documentation, I didn't check the connection.

Error of first attempt. Untitled

On my second attempt, something seems to be created in vCenter which aborts the installer-provisioned script. Error for 2nd attempt as below.


root@okd-dhcp:# ./openshift-install create cluster --dir=/opt/openshift/ --log-level=info
INFO Creating infrastructure resources...
ERROR
ERROR Error: could not create category: 400 Bad Request: {"type":"com.vmware.vapi.std.errors.already_exists","value":{"messages":[]}}
ERROR
ERROR   on ../tmp/openshift-install-549359176/main.tf line 54, in resource "vsphere_tag_category" "category":
ERROR   54: resource "vsphere_tag_category" "category" {
ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

```root@okd-dhcp:# pwd
JaimeMagiera commented 3 years ago

If on previous attempts the installer got so far as to create the category, it would need to be deleted before running the installer again.

https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vcenterhost.doc/GUID-BA3D1794-28F2-43F3-BCE9-3964CB207FB6.html

https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs/resources/tag_category

sysThematic commented 3 years ago

Hi all, I've the same problem on Vsphere 7.0

DEBUG vsphere_folder.folder[0]: Creating...
DEBUG vsphere_folder.folder[0]: Creation complete after 0s [id=group-v3180] DEBUG vsphereprivate_import_ova.import: Creating... ERROR
ERROR Error: rpc error: code = Unavailable desc = transport is closing ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

openshift-bot commented 3 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

evertmulder commented 3 years ago

I also had the error when the template is being created. Note tags were created without issues, that seems another unrelated issue.

DEBUG vsphereprivate_import_ova.import: Creating...
ERROR
ERROR Error: rpc error: code = Unavailable desc = transport is closing
ERROR
ERROR

It turned out I used a datastore that was not available for the cluster I was installing to. Changing to a datastore that is available for the cluster, the problem was fixed.

Perhaps this will help someone, it is a bit unclear from the logs.

openshift-bot commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 3 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/okd/issues/309#issuecomment-907469629): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.