kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
320 stars 49 forks source link

Add VMWare VSphere as supported platform #1300

Open mazzy89 opened 3 years ago

mazzy89 commented 3 years ago

Current situation

Support VMware as a provider.

Ideal future situation

Attract a high number of customers/users.

Additional information

I would be interested to have VMware implemented as a provider. I have access to the VMWare platform so it would be easy for me.

invidian commented 3 years ago

Hey @mazzy89, thanks for opening the issue. Could this be implemented using https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs ?

mazzy89 commented 3 years ago

Hey Mateusz. yes the idea is right to use the VSphere provider and create VMs.

mazzy89 commented 3 years ago

Could you please Mateusz clarify one thing. I've seen around that you would like to use the TF module controller and worker to bring reusability. In case of the creation of a new platform like this case, do you recommend to use these two modules?

invidian commented 3 years ago

Could you please Mateusz clarify one thing. I've seen around that you would like to use the TF module controller and worker to bring reusability. In case of the creation of a new platform like this case, do you recommend to use these two modules?

Yes, using those modules would be preferred approach. Given that they are only used for one platform so far, they may require some changes to make them reusable. But using them should save some hassle implementing new platform.

mazzy89 commented 3 years ago

Ok. I will start then to put down the PR

mazzy89 commented 3 years ago

@invidian any recommendations before putting down the PR similar to the one discussed above?

invidian commented 3 years ago

Hmm, perhaps you can look at 84821fd4e50aa3fe55e7e17b192affda8d25aa10 to get an idea what is required to get a full platform added, so Terraform, Go code and some tests. The documentation and quickstart guides can be done separately I think.

Also if you could document how to test this PR, that would simplify things a lot, as we will have to set up some CI for it later. The CI configs are also optional, though appreciated.

If you need to do some changes unrelated to new platform, for example in common modules, please commit them separately or even open a separate PR with those changes so we keep PRs reasonable small and focused.

Hope this helps. Let me know if you have more questions :)

mazzy89 commented 3 years ago

Also if you could document how to test this PR, that would simplify things a lot, as we will have to set up some CI for it later. The CI configs are also optional, though appreciated.

Do you have guys a VMWare account in which to run integration tests? Or you mean unit tests?

invidian commented 3 years ago

Do you have guys a VMWare account in which to run integration tests? Or you mean unit tests?

We don't at the moment, but after quick check, it seems Equinix Metal offers VMware machines, so we might be able to utilize that, if there is no simpler way (e.g. via some sort of nested virtualization).

mazzy89 commented 3 years ago

Yeah Equinix offers ESXi. All right for implementation, I will run in my Vmware account then the CI will run on Equinix.

mazzy89 commented 3 years ago

@invidian Which kind of assumptions do we want to make about the DNS for VMware? Do we want to go down the path like for the bare metal platform? Or do in a different way?

mazzy89 commented 3 years ago

I've been dealing now with the boot of VMWare instances. there are many ways to boot a VMware instance but just a few of them are actually supported by Terraform.

Despite OVA might look like the most straightforward one, it does not give ways to customize properties like CPUs, Memory, Disk without playing with VMware primitives.

Another idea would be to fetch the vmdk file you guys ship it per release, bake it into a VMware disk, and then attach the disk to the VMware instance. This would be a convenient approach. I haven't tested though at the moment. My only point with it is where to fetch the file. I haven't seen any pre-apply step where to fetch assets in the CLI logic.

Perhaps we should ask to provide it as requirements during the bootstrap of the cluster? Or fetch it via Terraform leveraging null_resource? or any other suggestions here?

Another way would be also to have a template already available that could be used in the disk. I recall this was one of the way for Tectonic to spin up CoreOS VMware instances https://github.com/coreos/tectonic-installer/blob/0ec6b27c6d4ba56f03eef6425f52292aec20cb1c/examples/terraform.tfvars.vmware#L358 ovftool allows to create a template from an ova using the --importAsTemplate flag. So we could give it as requirements to spin up a VMware environment.

i.e.

ovftool --name="Flatcar-stable-2605.10.0" --datastore='datastore' --skipManifestCheck --noSSLVerify --allowAllExtraConfig --importAsTemplate ./flatcar_production_vmware_ova.ova 'vi://<username>@<vcsa-addrress>/<datacenter>/host/<hostname>/'
invidian commented 3 years ago

Perhaps we should ask to provide it as requirements during the bootstrap of the cluster? Or fetch it via Terraform leveraging null_resource? or any other suggestions here?

With Tinkerbell sandbox user needs to download the image to be used by the sandbox machine. If there is some one-off task do to before provisioning the clusters, then I think we can require from user to do it manually.

Example with ovftool seems like something like this. I assume the template can be then re-used across the clusters? So user would have to push the template, then refer to it (via name I guess?) in the cluster configuration. That sounds like a reasonable approach to me, at least initially.

Maybe someone from @kinvolk/flatcar-maintainers has better knowledge how to do that on VMware :)

mazzy89 commented 3 years ago

I assume the template can be then re-used across the clusters?

Yes, correct. You create the template once and then you reference it in all the VMs as in here https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs/resources/virtual_machine#cloning-and-customization-example

pothos commented 3 years ago

I think it's better to no rely on Cloning because the free ESXi version does not support it (Cloning requires vCenter and is not supported on direct ESXi connections.). You can, for each VM, upload an OVA disk image file to create it, and in a second step set any VM machine configs like RAM size. It's a bit less elegant but more general this way.

mazzy89 commented 3 years ago

You can, for each VM, upload an OVA disk image file to create it, and in a second step set any VM machine configs like RAM size.

I see your point and I guess you mean to leverage the provider block ovf_deploy. However I can't imagine making all of this in Terraform. We would require to introduce more custom logic.

mazzy89 commented 3 years ago

Actually, I've more carefully read and also the ovf_deploy requires VCenter access (https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs/resources/virtual_machine#deploying-vm-from-an-ovfova-template). At this point @pothos I'm not sure what you meant. Could you be more specific?

pothos commented 3 years ago

Not sure about the details but the free ESXi web UI supports that as far as I know. Another option is to use vmdk images if the Terraform provider is too limited.

mazzy89 commented 3 years ago

Oh, I think then if for you guys having this project running on a free ESXi tier is a hard requirement then we can't support this provider. In fact, as stated the Terraform VSphere provider requires to have write support which is not available for the free ESXi tier as stated in here https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs#vmware-vsphere-provider Last word to you @pothos and @invidian 🙂

pothos commented 3 years ago

Aha, didn't know that! I found https://github.com/kube-cloud/terraform-provider-esxi if that helps. Another hint I can give is that this is what works well in mantle: https://github.com/kinvolk/mantle/blob/flatcar-master/platform/api/esx/api.go (probably has some hacks/legacy things)

I don't have a strong opinion on what to implement and require, just wanted to share that if it's possible to keep it compatible with ESXi it would be nice but given the current state it's difficult…

mazzy89 commented 3 years ago

Aha, didn't know that! I found https://github.com/kube-cloud/terraform-provider-esxi if that helps.

I did not know about that community ESXi provider. This would keep the things free and nicely maintained within Terraform. It seems well maintained and supported. Not sure if there are any caveats or limitations but it worths to take a look.

However, I would like to point out the attention to a real business scenario. If a VMware customer comes out is highly likely that he runs VCenter to manage his ESXi fleet. Furthermore even for this product, it would make not much sense to deploy controllers/workers on the same ESXi host . It would defeat the entire reason to run Kubernetes because controller(s) and workers would end up in the same ESXi so will have one unique point of failure. It would be good for a development/testing scenario but we would need to mark this implementation as non-production ready.

Differently if we would say to have multiple ESXi hosts, it would be hard or better to say more convoluted, to plan properly the creation of VMs. How would we spread the controllers/workers across ESXi? Which mechanism should we follow? Should be the user choosing into which ESXi a controller node or a worker would end up? I would say that this is not a real problem but more like a design concern.

However I would still see a real use case for using ESXi terraform provider and it is for CI. In fact as spoke above with @invidian we would like to run tests against Equinix which provide ESXi hosts and not VCenter so it would be impossible to have tests running there using VSphere Provider while very simple if we would leverage the Terraform ESXi provider.

Ultimately I would say that for a production-ready scenario, it is more appropriate to use VSphere Official provider while for a small testing scenario or running CI (non-HA, one unique ESXi host, free-license, etc...) having ESXi provider is more suitable.

How does this sound for you guys?