threefoldtech / tfgrid-sdk-go

Apache License 2.0
2 stars 4 forks source link

🐞 [Bug]: Nodes with network contract cruft don't go standby #966

Open scottyeager opened 8 months ago

scottyeager commented 8 months ago

What happened?

One farmer asked why their nodes were not going to standby state when using farmerbot. The two nodes in question are 176 and 597 on mainnet. The farm has one other node which is successfully going standby.

Checking these nodes, they each have 1 workload and 1 deployment but 0% CPU reserved. For example:

image

image

Looking in GraphQL, we can see these are only network contracts:

image

For whatever reasons, it's not uncommon that network contracts get left behind in the course of deleting deployments. These contracts have no function if there's not an associated VM on the node, but they are keeping the node from going standby. The only exception I could think of would be gateway nodes, which might have some network contracts for providing gateway access, but those should not be going standby anyway.

I see two possible resolutions:

  1. Base the determination of whether a node is in use on whether there is any CPU reserved
  2. Filter out the network workloads when looking at active contracts

which network/s did you face the problem on?

Main

rawdaGastan commented 8 months ago

It is not allowed to power off nodes with active contracts

scottyeager commented 8 months ago

I see. Will need to address upstream at TF Chain.

I'll leave this one open too for now.