rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.33k stars 2.96k forks source link

Request: Add support provisioning to Proxmox/QEMU #22757

Closed aleksandrmetik closed 5 years ago

aleksandrmetik commented 5 years ago

Hello Could you please implement auto-provisioning a new cluster and change cluster size in Proxmox VE? (QEMU, not LXC). Maybe, similar as AWS EC2 provisioning.

Instance = QEMU VM Node = Physical node with Proxmox VE hypervisor.

Proxmox VE has API: https://pve.proxmox.com/pve-docs/api-viewer/index.html And the requested command in "Post on path /nodes//qemu" You can use some approaches to do it:

Create a QEMU instance from iso (example: ROS) and perform remotely (by SSH connect from the master node) requested command to add the node to a cluster in RKE. Create a QEMU instance from iso (example: ROS) and use CloudInit to perform requested command to add the node to a cluster. Some dialogues are required:

Configuration table of Proxmox VE: Which nodes should be used (global) Template of a new instance -- сount of CPU, RAM, required disk-size, ISO image to deploy ROS. -- manual IP range to use or DHCP. -- exclude/include Proxmox VE nodes for the deployment of this template GET /api2/json/nodes -- select which local Proxmox VE storages should be used. GET /api2/json/nodes/{node}/storage Please note that the simple logic should be implemented to RKE side: a. Check if available RAM to place a new instance on the node? b. Check if available disk space on the storage on target node? c. Check if available ISO image with ROS (if it is required)? d. Check for the preferred location (example to avoid place all RKE instance on one Big Proxmox VE node in the case when some other Proxmox VE nodes are available). e. Purge an instance and stop provisioning in the case when provisioning has been failed to avoid cycle and resource exhaustion on Proxmox side. f. Check what nodes are available before any provisioning or changes.

The good case is to display the monitoring metrics of Proxmox VE Cluster in cluster dashboard: GET /api2/json/cluster/resources GET /api2/json/nodes/{node}/netstat

Many thanks!

vincent99 commented 5 years ago

Individual providers are docker-machine drivers and new ones can be loaded by providing the URL to the binary in the Drivers screen.

We do some maintenance and customization of the more popular drivers, but are not going to be writing and supporting a new driver from scratch for a niche product that doesn't really want it.

This exists and may work: https://github.com/lnxbil/docker-machine-driver-proxmox-ve

wioxjk commented 10 months ago

I would a node-driver for Proxmox, but as @vincent99 pointed out - the proxmox guys does not really seem that keen on supporting modern things like real containerization and dynamic workload that rancher (and kubernetes) offers.

hweidner commented 10 months ago

I think this is a misunderstanding. What the Proxmox staff declined in the linked thread is to support Docker containers directly on the Proxmox host, as a third virtualization technology besides KVM and LXC.

I can't see any hint that they have anything against a machine driver that creates and deletes VMs over the API. Any external tooling can consume the API. This is not different from other VM technologies like VMware or oVirt.

wioxjk commented 9 months ago

@hweidner You are most likely correct here. The thread mentioned is old, so it is probably worth bringing up this discussion again.

Proxmox offers a cheap and reliable way to host workload on-prem, it has an API, it has cloud-init, and everything else that you want from a modern Hypervisor-solution.

Hopefully this will be a reality someday, especially now when Broadcom is making the future uncertain for VMWare customers with a massive increase in pricing and axing of products in their portfolio.

Termibg22 commented 7 months ago

It would be great to have a Rancher driver capable of creating VMs and automating the RKE2 installation. As you mentioned, this would be a highly beneficial addition for new clients transitioning from VMware.

thesuperzapper commented 7 months ago

@Termibg22 Rancher does have a project called Harvester, which is effectively a KVM management system (like Proxmox/VMWare), but built around KubeVert.

So I think if you set up RKE2 on Harvester, and configure cluster-autoscaler RKE2 integration, you can get a very "cloud like" autoscaling RKE2 cluster, with node-types that you define.

Termibg22 commented 7 months ago

@thesuperzapper You're right, and I've tested it. It seemed to get a bit stuck in the creation and deletion of machines, but it has a lot of potential. However, many clients prefer to have their hypervisor infrastructure with Proxmox (and might believe that Harvester is still not ready for production). Also they could deploy a virtual machine with Harvester but there is the associated resource and management cost to have this driver functionality.

thesuperzapper commented 7 months ago

@Termibg22 I'm interested to know what you mean by "getting stuck", so I can see if I have the same issue.

Termibg22 commented 7 months ago

@thesuperzapper I tested it some time ago so I'm doing it again to check the newest version. The first thing I saw is that when I create a new cluster the nodes from the pools show with the following message:

Deleting server [fleet-default/pro-master-cac01797-pg2ps] of kind (HarvesterMachine) for machine pro-master-55654c5bcdxz8bdn-2scrn in infrastructure provider

I don't know why they are not being created in Harvester but I think the information from Rancher is not very useful. Also I don't know why it says it's deleting the server when it should be trying to create it.

Edit: After some time seems like it has created the first machine and shows:

Creating server [fleet-default/pro-master-cac01797-m2fdh] of kind (HarvesterMachine) for machine pro-master-55654c5bcdxz8bdn-2lxf4 in infrastructure provider

The second machine is still showing the "deleting server" state.

Edit2:

I have changed the image from the pool to try and the old machines are not being removed, they are stuck in state Deleting and Reconciling.

Termibg22 commented 7 months ago

@thesuperzapper I know I'm tinkering around and in a productive environment this would be done with much greater care and order. However, with this behavior I am trying to demonstrate a certain lack of resilience. There are situations in which, due to some variable or another, the machines are not able to self-terminate and proceed with the deployment.

Edit: I noticed one message that dissapear very quickly but shows maybe a DNS problem

Failure detected from referenced resource rke-machine.cattle.io/v1, Kind=HarvesterMachine with name "pro-master-c4961803-btbmk": Downloading driver from https://rancher-url/assets/docker-machine-driver-harvester Doing /etc/rancher/ssl ls: cannot access 'docker-machine-driver-*': No such file or directory downloaded file failed sha256 checksum download of driver from https://rancher-url/assets/docker-machine-driver-harvester failed

Termibg22 commented 7 months ago

Hi, im still facing the same issues. I found this post that looks like the same problem: https://slack-archive.rancher.com/t/8233110/playing-around-with-this-spaghetti-mess-of-rancher-manager-a

cuza commented 6 months ago

I know this is an old thread but:

This exists and may work: https://github.com/lnxbil/docker-machine-driver-proxmox-ve

This works amazingly as a node driver for RKE1 and you can use this as an UI for it

I'm currently using it with cloud init VM images. The driver will clone a VM image in proxmox and create VMs for the nodes.

Termibg22 commented 6 months ago

@cuza Does that driver work with RKE2? I'm fully committed to RKE2 and haven't used RKE1 at all.

cuza commented 6 months ago

@cuza Does that driver work with RKE2? I'm fully committed to RKE2 and haven't used RKE1 at all.

@Termibg22 is a docker-machine driver which is what RK1 uses as node driver. Will work in Rancher 2.x but not RKE2