rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 655 forks source link

v.1.5.6 hangs at Waiting for VMware Tools to come online... #3016

Open lynic opened 4 years ago

lynic commented 4 years ago

RancherOS Version: (ros os version) v1.5.6 rancheros-vmware.iso

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) vshpere

Add. a new node, it stucks at (kube-node4) Waiting for VMware Tools to come online...

Login into console, didn't saw any message to start vmware tools.

It was work at v1.5.5

sabur7 commented 4 years ago

I'm having the same issue.

jwrascoe commented 4 years ago

Yeah I hit the same yesterday shortly after it was released, I then tried to go back to v1.5.5 that doesn't work either for vmware, does anyone have a work-around till 1.5.6 is fixed?

sabur7 commented 4 years ago

I was able to use https: https://github.com/rancher/os/releases/download/v1.5.5/rancheros-vmware-autoformat.iso//github.com/rancher/os/releases/download/v1.5.5/rancheros-vmware-autoformat.iso successfully

On Jun 4, 2020, at 11:31 PM, James Rascoe notifications@github.com wrote:

Yeah I hit the same yesterday shortly after it was released, I then tried to go back to v1.5.5 that doesn't work either for vmware, does anyone have a work-around till 1.5.6 is fixed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rancher/os/issues/3016#issuecomment-639237327, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDGR4IL32RREOHYHKFX45DRVBRJTANCNFSM4NTED3DQ.

marcobucci commented 4 years ago

I have the same issue, I'm using Rancher version 2.4.3 and the error received on RancherOS is:

ros-sysinit:error: Failed to load service(open-vm-tools): Failed to parse YAML configuration for open-vm-tools: Expected document start at line 0, column 0

senk commented 4 years ago

Problem is that there is no tag/branch "v1.5.6" in rancher/os-services

abarry-gn commented 4 years ago

I have the same issue. I'm using Rancher 2.4.4 to deploy nodes with RancherOS 1.5.6 and it get stuck at "Waiting for VMware Tools to come online". While in RancherOS I have the error: ros-sysinit:error: Failed to parse YAML configuration for open-vm-tools : Expected document start at line 0, column 0

ros-sysinit:error: Failed to load service(open-vm-tools): Failed to parse YAML configuration for open-vm-tools: Expected document start at line 0, column 0

jwrascoe commented 4 years ago

@senk is there a way to make it work or do we have to wait for Rancher to fix.... I tried running 1.5.5 from this link... https://github.com/rancher/os/releases/download/v1.5.5/rancheros-vmware.iso but now that one is broken trying to start Kube (but was fine a few days ago) Im sort of stuck right now and not able to deploy rancher 2 / vmware RKE clusters

senk commented 4 years ago

Yes rancher team needs to fix that. Meanwhile you can use https://releases.rancher.com/os/v1.5.5/rancheros-vmware.iso in your node templates

cjellick commented 4 years ago

We are working on this now. Thanks for the report

1.5.5 has not chaged and should still function. Note that you likely want rancheros-vmware-autoformat.iso not rancher-vmware.iso.

cjellick commented 4 years ago

@barryboubakar any more details about your environment or setup that you can share? Is your environment airgapped? Can you post the contents of your nodeTemplate (minus any sensitive info)

senk commented 4 years ago

The 1.5.6 tries to fetch the os-services with the branch 1.5.6 which is not available and fails for that reason. i guess we all use a non-air-gapped environment default config rancher latest and try to provision nodes on vmware vsphere. i have no exact error message as i got the workaround working but the error message was the non fetchable "https://raw.githubusercontent.com/rancher/os-services/v1.5.6/index.yml"

senk commented 4 years ago

@cjellick as the branch is now available the provision of 1.5.6 now works and vmware tools start as expected. thanks for the fix!

kdjsfgodsfg commented 4 years ago

Hello,

I had the same error trying to deploy nodes with RancherOS 1.5.6

It seems that during sysinit, os tries to load open-vm-tools from https://raw.githubusercontent.com/rancher/os-services/v1.5.6/open-vm-tools.yml This URL return 404 Error from my browser or with a curl command.

However, the real path for this unprocessed file is https://raw.githubusercontent.com/rancher/os-services/v1.5.6/o/open-vm-tools.yml This URL leads to the correct file.

Thank for your help

senk commented 4 years ago

@kdjsfgodsfg can you try again? for me its working now!

cjellick commented 4 years ago

yes, we rebuilt and pushed updated artifacts. cc @dweomer

sabur7 commented 4 years ago

Curious of the difference between rancheros-vmware-autoformat.iso and rancher-vmware.iso ?

Sent from my iPhone

On Jun 5, 2020, at 12:35 PM, Craig Jellick notifications@github.com wrote:

 We are working on this now. Thanks for the report

1.5.5 has not chaged and should still function. Note that you likely want rancheros-vmware-autoformat.iso not rancher-vmware.iso.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jwrascoe commented 4 years ago

@sabur7... I would also like to know as well... autoformat says its supposed to be for vmware "Workstation". We run just regular esxi 6.7 enterprise plus...

Could could someone weigh in on the correct version to use?

We run the latest & stable Rancher 2 on the latest k3s in a HA mode with a external Postgres DB We use this method.. https://rancher.com/docs/rancher/v2.x/en/installation/k8s-install/kubernetes-rke/

Then we deploy RKE clusters thru vmware.

Looking for the best practice to run this in a production mode.

cjellick commented 4 years ago

@sabur7 - @niusmallnan explained this to me just the other day (I'm no expert here, so this will just be high level): if you use rancheros-vmware.iso will just run in memory, rancheros-vmware-autoformat.iso will install to disk properly.

An extra bit of confusion is that this url (which rancher server uses by default when provisioning rancherOS to vspehere): https://releases.rancher.com/os/latest/rancheros-vmware.iso is actually a soft link to the autoformat iso.

@jwrascoe since you asked about production best practices, I feel obligated to point out that RancherOS will hit end of maintenance end of this year and end of life June 2021. See, https://rancher.com/support-maintenance-terms/. Make sure you take that into consideration for production use cases.

If you are seeking to provision RKE clusters in vsphere through Rancher, the actual best practice would be to deploy your nodes using a vsphere VM template or content library rather than straight from the iso, which was a feature added in v2.3.3.

jwrascoe commented 4 years ago

@cjellick Thanks for the info... very helpful.

sabur7 commented 4 years ago

Ditto

On Sat, Jun 6, 2020 at 2:25 PM James Rascoe notifications@github.com wrote:

@cjellick https://github.com/cjellick Thanks for the info... very helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rancher/os/issues/3016#issuecomment-640099694, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDGR4KEGJVPMKUY5WS5DS3RVKCXXANCNFSM4NTED3DQ .

-- Dominic Taylor sabur7@gmail.com 678-520-0608 Cell

kdjsfgodsfg commented 4 years ago

@senk Hello, I had to download again the rancheros-vmware-autoformat.iso, but the deployment worked totally fine this morning. thank you.

AntonSmolkov commented 4 years ago

@cjellick

If you are seeking to provision RKE clusters in vsphere through Rancher, the actual best practice would be to deploy your nodes using a vsphere VM template or content library rather than straight from the iso, which was a feature added in v2.3.3.

Unfortunately RancherOS does not support provisioning as a VM template. Issue

RancherOS will hit end of maintenance end of this year and end of life June 2021.

Does that mean that RancherOS 1.5 is the last version and the whole project is going to be retired? If so, i'm going to switch to Ubuntu 18.04 before it is not too late...

gillarda commented 4 years ago

There is also an open issue in rancher/os regarding the provisionning via templates and the cloud init issue : https://github.com/rancher/os/issues/2559

lynic commented 4 years ago

Verified it's fixed on vsphere 6.7, thanks.

ehginanjar commented 4 years ago

I've been debugging this issue all day. In my case the root cause is lacking of memory. Previously I set 4GB of Mem for role controlplane and etcd. It seems that some containers are not able to run.

So for mycase It needs > 6 GB of Mem. It works fine now

ilanh commented 1 year ago

I had the same issue, my case was that one of the networks added to the template didn't have a DHCP server. Hope this helps