Closed xmonader closed 1 year ago
Well I was going to create the issue here, but I'll add comments here
Energy costs of a node is in all occurences something to take in account, but nowadays and specially in Europe, it is of vital importance for the viablility of the grid itself.
Costs for running a node, a part from networking overhead, is now basicaly the highest investment over 5 years, more so than the investment of the hardware itself. While it doesn't seem much calculated per month, future prices, even when 'the new normal' sets in with a new equilibrium, will be nowhere in the vicinity of the prices set a year ago.
The pursuit of dramatically lowering the energy consumption of the Grid in order to be green(er) now has another incentive: money.
While we all wish all nodes have their cpus, memory and storage maxed out and the world IT community is knocking on all our doors for more, we're not there yet.
Some farms are 'just' online. No workloads. Just generating tokens for the Farmer. In Principle, these nodes represent the investment of the farmer to the size of the Grid, and while some costs are invloved like housing, networking,.. there should be no need for nodes that have no workloads running to be powered on in the first place.
Powering off a node should be straightforward: When a node has no workload at all, it can be shut down.
Question:
Powering on a node will be needed to be done by a smarter provisioning scheme in function of the size of a farmer
We only support powering off nodes in farms that have 1+n nodes in the same network.
One node in a farm will always be on, hosting the poweron service.
For powered off nodes to be off as long as possible, finding a node to deploy a workload becomes a bit more hairy, and we'll need a lot of verification that a recent powered on node is properly started and capable to host workloads.
Powered off nodes will generate the same amount of tokens as if powered on, but need to be regularly (randomly?) powered on to formally ackowledge their existence.
Seen the sheer number of interfaces that PDU brands have, it would be virtually impossible to support them all, so powering off/on should nog be done with PDU. The more, we can't surmise that a farmer always has a pdu for powering his nodes. So no PDU
acpid
in ZOS (sorely missing since forever)nopoweroff
).
So to be able to WoL a node, we'll need:
I added 'Technical' that people can fill in implementation details and issues in the concerned repos
Just checked, acpid
is not needed, we can handle that via zinit
directly :)
Suggestion on how this can work on ZOS.
If a node is woken up to find out that it's target state is Down
it can simply send the uptime report and go back to sleep automatically. This will make it easier for the power manager to randomly nudge node to proof their existence.
The idea behind having the power manager send the power off decision to the node although the node just can check its own target state is that this validate that the target node is reachable by the power manager. hence can be walking up again.
This solution will make it okay if u have multiple LANs that join the same farm. a farmer can then have multiple power manager selected in each LAN with no issue.
I think we also need to rework the way deployments are created. I think the user needs to have an agreement with a farm rather than a node. Since a user can only use online nodes in a farm, and the user doesn't really know in advance which nodes these are. If we keep supporting the NodeContract(nodeID)
a user can create a node contract for a sleeping node in a farm and never have it's workload deployed.
I think the managing node should also act as provisioning manager. The user should be able to create a contract with a Farm
and the managing node should see this contract being created and redirect the contract to a node that is able to accept the workload.
Maybe if we keep the NodeContract
the chain can actually check if the node is up or down and return an error to the user in case the node is down
.
The user can know the state of the node from the TargetState
of the node (Up, or Down) so he is free to choose a node that is UP from the start
If a user choose a node that is down the chain can then bring the state of the node up. This will take sometime to bring up fully of course. Hence the user need to know he has to wait until the node is up before talking to the node directly
So per discussion with azmy;
We can extend the code on the chain where the create_node_contract
takes into account the following things:
This brings up the question if we actually need to specify a node id on contract creation or actually a farm ID. If we provide a farm ID and Resources to contract creation the chain can select the node for the user.
This also is in contrast with the proposed solution for capacity planning here: https://github.com/threefoldtech/home/issues/1304#issuecomment-1245225199
The main problem I have with a farmer elected manager is that this introduces a single point of failure in the design. On the contrary, if the logic to select nodes to poweroff is idempotent, no single central manager is required. Depending on farm size, multiple nodes can be left operations, which can then use a slot based leadership system to decide who will handle which events.
@LeeSmet what about the capacity planning? What are your thoughts on above comment?
There are two types of events:
Another thing: naming is important: we already have down as not reachable, or not available in any way. Shouldn't we call it 'sleeping' or something like that ?
Contracts per farm, indeed, that way no-one can generate workloads like Network Resources just to start all nodes in a farm
@delandtj
If we enforce the rule that a single farm need to exist on the same LAN then indeed we can drop most of the complexity nodes can listen to their own power off signal and make the grid the solo manager of the power management. Bringing a node up should then generate an event that can be picked up by all nodes in the same farm, hence they all can generate the magic packet to wake up their sleeping friend.
We still need then to discuss how the grid gonna decide what nodes need to go to sleep, and on what conditions it can bring them up again.
Also regarding having the deployment contract with the farm itself, and not the node. The grid then need to still select a node and assign it to the contract (and possibly brig it up) which means capacity planning entirely has to happen on the chain (which i don't mind if we already have all the data). Once node is selected and assigned to a contract. The user then need to "wait" until the node status if fully up before he can contact the node to actually deploy his stuff.
Those changes combined (imho) are a major change to the grid (hence a new major version?)
Only nodes that support TPM will be able to be powered off.
What's the thinking behind this requirement @delandtj?
@delandtj @DylanVerstraete and @LeeSmet we really need to agree on the final approach to be able to create the related (technical) issues. Could you please read my previous comment, and comment if this (technical wise) is good?
Looks good yes. I only think the user experience will get worse with this power management feature. If the user wants to deploy on a farm that needs to boot a node in order to host his workload then he possible will have to wait for like 5-10 minutes..
So in a nutshell: what do we convene over this ? I mean, we need to set in stone also what the implementation details will be.
Looks good yes. I only think the user experience will get worse with this power management feature. If the user wants to deploy on a farm that needs to boot a node in order to host his workload then he possible will have to wait for like 5-10 minutes..
This can be messaged
Okay, i will try to write down a dump of all changes that are required based on our meeting regarding capacity planning with power management: Since nodes will be sleeping, a user can not choose a node to deploy, it's up to the grid to find the most suitable node with the option of bringing nodes up if needed.
Once the contract node id is set, the user is ready to contract the node to deploy his contract as usual.
zos
.Note, those changes are related to capacity planning only and not the entire power management story.
after a little discussion with @DylanVerstraete we agreed on the following: To improve events processing, we will also keep a map of contracts that are created (per farm) that still need node-id which means if events stream is interrupted the node can still check that the state of the map was not changed. Contracts that get their node id are removed from the map.
on the node assigning, IMNSHO it will be needed to be able to deploy on different nodes that for something like kubernetes clusters(it shouldn't be deployed on the same node)
@AhmedHanafy725 yes, you are right. @rkhamis brought this up during the meeting and I forgot to document it here in the issue. I had a suggestion is to create a special type of contract. can be called ClusterContract. which is basically a set of contracts + a policy. Once created, the capacity planning process will know (based on the policy) that those contracts can not be deployed on the same node then each sub-contract is assigned a new node.
The process can go like this
We had discussions regarding real life use cases (k8s cluster, and separate network workloads): A contract object will have this new attribute
policy
this policy is an enum of the following values:
any
: it's up to the capacity planner to find a suitable node, no restrictions except the required resources capacityjoin(contract-id)
: means this contract must use the same node id as per this given contract id. in other words they have to be deployed on the same node. If requested capacity can't be satisfied by the given node, contract creation fails with the possible error.exclusive(group-id)
: where a group-id is an id of a group object. A group object only has an id and an owner (twin-id) the id is used to group contracts. when using this policy all contracts that are using the same group can not have the node-id so each contract in the same group need to have a different node-id.On deploying k8s start by:
join
policy to the corresponding VM contract (you need a network next to each vm) . Note that no deployments are done yet.the booting time according to Jan can be between 2-10 mins, which is .. bad. I guess that means the power manager will be the main node to provision resources on , and it needs to automatically boot other nodes when it reaches a specific threshold, but that's also quite cumbersome, e.g someone wants to a node with GPU and there's no reference of GPU on the power manager, meaning, the user may end up waiting 2-10 mins for the VM to boot.
Also, not all nodes are created equal, some could be specialized for cpu, ram, storage, or gpu, some sort of tagging notation might be needed to wakeup the right node(s)
We need to assume that a single farm can span multiple lans this is an iteration over this comment
each node object has
enum PowerTarget{
Up,
Down
}
enum PowerState {
Up,
Down(leader_id)
}
struct Node {
power: Power {
target: PowerTarget,
state: PowerState,
},
...
}
F
that lives in multiple LANs (collision domains). as follows:S1
has node [N1, N2, N3, N4]S2
has nodes [N5, N6]S3
has nodes [N7]nodes can find about all direct neighbors nodes by simply getting information about all nodes in the farm, then try to reach them over the local zos
ip. An HTTP service that is only available on local zos interface, the service need to return a signed response this way we can grantee a node is exactly what it claim to be. (to avoid situation where nodes on different segments has the same private IP). In the example above N7 and N4 for example can has the same private IP.
This way each node can learn about it's immediate neighbors that lives on the same segment. For each segment at least single power manager is elected, election is very simple:
Hence in the example above:
S1
: -> N4 is selected because it has public-configS2
: -> N5 is selected because it has the lowest ID in that segment.S3
: -> N7 is selected because it's the only node in the segment. Then:
power.target
to Down
Down(leader)
where the leader is the ID of the node that requested the power off.power up
event for a node. If the node state is set to Down(id)
where the ID is my own id. it means this is the node the requested the power off for that node. Hence it can then send the WOL
package.Down
node can then send an uptime
and power itself off again. nothing changes. It's done like this to handle random power nudges for capacity validation.Up
the node update it's power.state to Up. and continue normal operation.Now back to the example above. Let's assume this farm is completely free of workloads. Grid will decide that it can power off all nodes except the public node (N4). So let's say it sets all nodes target states to Down
accept N4
(the public node).
If you follow logic above we will end up with following state:
N1{power.target = down, power.state: down(N4)}
N2{power.target = down, power.state: down(N4)}
N3{power.target = down, power.state: down(N4)}
N4{power.target = up, power.state: up}
N5{power.target = down, power.state: up}
<- while target is down the node will never shutdown because it's the only leader in its segmentN6{power.target = down, power.state: down(N5)}
N7{power.target = down, power.state: up}
<- while target is down the node will never shutdown because it's the only node in its segment.I have a couple of questions:
@brandonpille
This does not change the contract reservation and billing cycle. this is solely related to node power cycle. Nothing much changes in the grid except for the "target" and "current" state. and the function to set the current state by the node
@muhamadazmy I think Brandon asks if the billing should trigger even if the node is still down (if it for some reason could not be brought up)
billing is related to capacity reservation. which should not exist unless a node target power is up. If a node "current" power is never got to "up" state means something is wrong. and billing probably need to stop may be
@brandonpille I think yes, the grid should accept creaation of capacity reservation as long as the node target state is up. Normally the current state should follow in few minutes (until the node actually is booted). May be during this time billing should not be done ?
@muhamadazmy how are these segments defined?
billing is related to capacity reservation. which should not exist unless a node target power is up. If a node "current" power is never got to "up" state means something is wrong. and billing probably need to stop may be
So we only start billing if the power is set to UP?
Billing will only trigger 1 hour after creation so it doesn't matter, if the node is still down by then, something is wrong.
booted
It would only be fair in my opinion to the user to only start billing when the node is actually UP. One more question. What do we do if the trigger UP event never comes? Do we add a timeout on it?
@brandonpille I think yes, the grid should accept creaation of capacity reservation as long as the node target state is up. Normally the current state should follow in few minutes (until the node actually is booted). May be during this time billing should not be done ?
I was talking about the deployment contract. Do we accept it whenever the capacity reservation is created, no matter the state of the node or when the node got UP?
I think yes. until there is a good reason not to.
Only nodes that support TPM will be able to be powered off.
What's the thinking behind this requirement @delandtj?
This was a mistake, TPM has nothing to do with WOL I believe
Those us us with a home datacenter will not be able to tolerate servers randomly starting up throughout the night. Nothing is louder than a server during startup. Please do checkups ONLY during daylight hours. Obviously startups can occur for deployments at any time, that is ok.
could on chain should be finished by 23-11, need couple more days on zos to integrate it, will start the clients updates as soon as possible
falling behind: requires more reworking https://github.com/threefoldtech/tfchain/issues/536
deadline will be updated after the engineering call today
Do we have an updated timeline, @xmonader?
Do we have an updated timeline, @xmonader?
For chain deployment on devnet we are aiming to happen next tuesday, most of the clients are almost code complete, but they need to be tested against real environments
close all linked issues we need new power mgmt story
with the current energy prices, we need to find away to turn off nodes, and still avoid abuse
the current favored solution is using wake-on-lan however, this requires some enthronements e.g the farms need to be location based, physically in the same lan and the farms need to provide some hot capacity
always available
for the provisioning and the remaining can becold capacity
that are subject to random turnon/off proceduresissues