Closed daanbosch closed 1 month ago
Hey @daanbosch,
thanks for this feature request. I will check how much changes are required to implement this and check if this is doable for release 1.0.0 or 1.1.0. Will update this request soon with more information.
Thanks, gyptazy
Hey @daanbosch
With the new param mode
which can be defined in the config file, it can now be defined if rebalancing should be done by used
(default) or total
resources.
This is currently available in PR #19 and should be merged soon. It will take place with release 1.0.0.
@daanbosch Can you please give it a try and let me know if I fully understood your request for this feature? Thanks!
Cheers, gyptazy
Oh amazing! Going to test this right away!
Hmm the number I'm getting are pretty odd:
<6> ProxLB: Info: [logger]: Logger verbosity got updated to: INFO.
<4> ProxLB: Warning: [api-connection]: API connection does not verify SSL certificate.
<6> ProxLB: Info: [api-connection]: API connection succeeded to host: <redacted>.
<6> ProxLB: Info: [node-statistics]: Added node node2.
<6> ProxLB: Info: [node-statistics]: Added node node1.
<6> ProxLB: Info: [node-statistics]: Added node node3.
<6> ProxLB: Info: [node-statistics]: Created node statistics.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb2.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb3.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb.
<6> ProxLB: Info: [vm-statistics]: Created VM statistics.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done for method: memory.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done by: total resources.
<6> ProxLB: Info: [rebalancing-calculator]: Balanciness is set to: 1.
<6> ProxLB: Info: [balancing-method-validation]]: Valid balancing method: memory
<6> ProxLB: Info: [balanciness-validation]: Rebalancing is for memory is not needed. Highest usage: 98% | Lowest usage: 98
<6> ProxLB: Info: [rebalancing-calculator]: Balancing calculations done.
<6> ProxLB: Info: [rebalancing-executor]: Starting dry-run to rebalance vms to their new nodes.
<6> ProxLB: Info: [rebalancing-executor]: No rebalancing needed according to the defined balanciness.
No rebalancing needed according to the defined balanciness.
<6> ProxLB: Info: [post-validations]: All post-validations succeeded.
<6> ProxLB: Info: [daemon]: Not running in daemon mode. Quitting.
Settings:
[proxmox]
api_host: <redacted>
api_user: <redacted>
api_pass: <redacted>
verify_ssl: 0
[balancing]
method: memory
ignore_nodes: none
ignore_vms: none
balanciness: 1
mode: total
[service]
daemon: 0
schedule: 24
log_verbosity: INFO
Also tried it with CPU:
<6> ProxLB: Info: [logger]: Logger verbosity got updated to: INFO.
<4> ProxLB: Warning: [api-connection]: API connection does not verify SSL certificate.
<6> ProxLB: Info: [api-connection]: API connection succeeded to host: <redacted>.
<6> ProxLB: Info: [node-statistics]: Added node node3.
<6> ProxLB: Info: [node-statistics]: Added node node1.
<6> ProxLB: Info: [node-statistics]: Added node node2.
<6> ProxLB: Info: [node-statistics]: Created node statistics.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb3.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb2.
<6> ProxLB: Info: [vm-statistics]: Created VM statistics.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done for method: cpu.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done by: total resources.
<6> ProxLB: Info: [rebalancing-calculator]: Balanciness is set to: 1.
<6> ProxLB: Info: [balancing-method-validation]]: Valid balancing method: cpu
<6> ProxLB: Info: [balanciness-validation]: Rebalancing is for cpu is not needed. Highest usage: 100% | Lowest usage: 100
<6> ProxLB: Info: [rebalancing-calculator]: Balancing calculations done.
<6> ProxLB: Info: [rebalancing-executor]: Starting dry-run to rebalance vms to their new nodes.
<6> ProxLB: Info: [rebalancing-executor]: No rebalancing needed according to the defined balanciness.
No rebalancing needed according to the defined balanciness.
<6> ProxLB: Info: [post-validations]: All post-validations succeeded.
<6> ProxLB: Info: [daemon]: Not running in daemon mode. Quitting.
VM's:
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| id | type | cgroup-mode | content | cpu | disk | hastate | level | maxcpu | maxdisk | maxmem | mem | name | node | plugintype | pool | status | storage | uptime | vmid |
+===========+======+=============+=========+=======+========+=========+=======+========+===========+============+============+================+==============+============+======+=========+=========+=============+======+
| qemu/100 | qemu | | | 0.04% | 0.00 B | | | 10 | 2.20 GiB | 195.78 GiB | 819.39 MiB | testproxlb | node1 | | | running | | 22h 44m 37s | 100 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| qemu/101 | qemu | | | 0.12% | 0.00 B | | | 5 | 50.00 GiB | 195.78 GiB | 772.17 MiB | testproxlb2 | node1 | | | running | | 4m 38s | 101 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| qemu/102 | qemu | | | 0.03% | 0.00 B | | | 12 | 2.20 GiB | 195.78 GiB | 836.94 MiB | testproxlb3 | node1 | | | running | | 22h 44m 30s | 102 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| qemu/9000 | qemu | | | 0.00% | 0.00 B | | | 1 | 2.20 GiB | 2.00 GiB | 0.00 B | focal-template | node1 | | | stopped | | 0s | 9000 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
Thanks, I just pushed a fix. Can you give it a try, please?
It does not make any sense to validate the current resources for balanciness when using total values: https://github.com/gyptazy/ProxLB/compare/ef60124c286d9e346690b45650700677d79a5b31..f14b94f7584377675022d740a98279a4e777d42f
However, this should work but still requires additional changes. Current disadvantage of this one is, that it will rebalance almost always the VMs.
I need to adjust the test cluster and integrate further changes.
Hmm now it wants to move every vm to node2 based on cpu (testproxlb2 is already on node 2) in this scenario.
VM Current Node Rebalanced Node
testproxlb node1 node2
testproxlb3 node1 node2
For the memory run:
VM Current Node Rebalanced Node
testproxlb2 node2 node1
testproxlb3 node1 node2
testproxlb node1 node3
This would be correct, however it does not really make sense to swap testproxlb2 and testproxlb3.
However it seems to be going in the right direction! Thanks!
This would be correct, however it does not really make sense to swap testproxlb2 and testproxlb3. However it seems to be going in the right direction! Thanks!
Yeah, that was what I meant with:
However, this should work but still requires additional changes. Current disadvantage of this one is, that it will rebalance almost always the VMs.
I'll probably have a look at this on Monday.
Just had a look at it this morning and decided to integrate this in a proper way which requires more restructuring in the code than previously assumed with more validations because it also already killed a node in my cluster in my test ;)
I'm already working on that and will push it when it is ready in a usable way.
Hey @daanbosch,
maybe you can give https://github.com/gyptazy/ProxLB/pull/23 a try by time. Currently, there's still a small issue included, where it might need to do an initial rebalance and works right away in the second run. This is something I'm still looking into...
Thanks, gyptazy
Hi @gyptazy,
I just tested #23 and it works fine for me. There are indeed some small things that make it not the quickest path to get the desired balance. However. It's already a great tool in the current state!
Hey @daanbosch,
I just tested #23 and it works fine for me. There are indeed some small things that make it not the quickest path to get the desired balance. However. It's already a great tool in the current state!
Happy to hear! I'll add some more improvements asap so that this should also immediately work in the first run. I encountered additional issues with the API and I can only rely on the (updated) information in the API to recalculate the best placement for VMs. You might also see a race condition, when retriggering that command too fast that you get inconsistent/outdated data from the API. While ProxLB is working stateless, this is an issue (maybe solvable by writing some state files in the filesystem, because I really like to avoid using any databases for this small service).
Cheers, gyptazy
Overview
For my use case, virtual machines (VMs) often exhibit bursty behavior, and moving them is not always feasible due to business constraints. Therefore, I request the ability to balance load based on the assigned CPU and memory resources instead of the current usage metrics.
Task
Implement functionality in Proxmox that allows load balancing to consider the assigned CPU and memory resources for VMs, rather than relying solely on current usage values.
Modify the load balancing algorithm to incorporate the assigned CPU and memory resources of VMs. Ensure the algorithm can dynamically allocate VMs to hosts based on these assigned resource values. Configuration Options:
Provide configuration settings to toggle between using current usage and assigned resource values for load balancing.