Need stricter checks for minting

sabrinasadik commented 11 months ago

@LeeSmet pls add comments here

LeeSmet commented 11 months ago

Reduce allowed uptime out of bounds from 5 minutes to 1 minute (see diff in https://github.com/threefoldtech/minting_v3/commit/5c99834adb1ca6f477ee06b8e6766459078c3fca for reasoning)
Reduce allowed downtime from power managed nodes from 25 hours to 24 hours (25 was an accidental leftover commit after testing purposes)
Add violation for nodes if uptime increased less than time increased (accounting for 1 minute of skew as per point 1). This was part of the original implementation but was later allowed in the early days of V3 since we were still figuring out how infrastructure was handled, and manual validation at that time showed no issues. As it often goes with these things, it has been forgotten and stayed part of the code base ever since.
Enforce max delay between uptime reports of 41 minutes (40 minutes zos interval + 1 minute of skew as per point 1). See https://github.com/threefoldtech/minting_v3/issues/22 for reasoning.
Add violation for nodes if the twin does not have a relay set (happened before, node is essentially not usable for the grid)
Add violation for nodes if the twin has a public key in invalid format (as that won't work for sure) (should never happen)
Fix twin decoding since it's currently broken. Several twins will have an invalid twin id and other garbage data when loaded from chain with the current code
Add violation for the case where a node twin does not exist (should never happen, not sure if the chain prevents this in case a node is attached to the twin).
Add violation for long term clock skew (cfr. https://github.com/threefoldtech/zos/issues/1914)
Miscellaneous code improvements which aren't directly related to payout (e.g. remove dead code, small performance improvement, better wording of violations for easy debugging, filling in sheets to check based on the data from receipt instead of calculating twice leading to possible discrepancies between data in overview and in actual payout file, ...)

i.e. revert the revert of all these commits.

The following point is in relation to v0.2.0 of the farmerbot and might be better of in its own feature, not sure.

Add max delay of half an hour between node power target rising edge and node uptime event of reboot. Farmerbot v0.2.0 introduced random wakeups to combat fraud. Since periodic wakeups continue as normal, a node could just ignore this (as currently nodes are only required to post uptime every 24/25 hours, see point 2, while farmerbot is running). The idea here is that since a random wakeup is unpredictable, this check essentially makes sure the hardware is still there. This check will also apply for regular wakeups. If the treshold is passed, the node gets stuck with a violation (and it doesn't mint for the month). 30 minutes is debatable but should be sufficient for even heavy duty servers to fully boot. Notice that this (time between rising edge power target and node boot) is also the delay a user has to wait for a node to come online if he deploys on a power managed node.

LeeSmet commented 11 months ago

In addition to the last point, there also needs to be consideration how we can enforce the presence of random wakeups over time in case the farmerbot is running.

xmonader commented 11 months ago

Add violation for nodes if the twin does not have a relay set (happened before, node is essentially not usable for the grid)

Add violation for nodes if the twin has a public key in invalid format (as that won't work for sure) (should never happen)

Fix twin decoding since it's currently broken. Several twins will have an invalid twin id and other garbage data when loaded from chain with the current code

Add violation for the case where a node twin does not exist (should never happen, not sure if the chain prevents this in case a node is attached to the twin).

I believe violations 5,6,7, 8 aren't on the farmer, but ZOS should refuse to boot new nodes if these ever happens

Is it possible for the farmerbot to expand with after wakeup calls it tries to reserve and deploy on the reported capacity?

LeeSmet commented 11 months ago

while 5 6 and 8 are indeed not really user errors, the goal of the minting is to mint tokens for useable capacity. If any of these conditions are true, then the node is not useable, and therefore it should not get tokens rewarded. While that is probably not the users fault, the minting doesn't care for that. If such an event should happen, the user can still be reimbursed, just not through the minting.

7 is referring to the fact that the current way in which twins are decoded from the chain data is inherently broken. This is just a problem local to the minting.

Is it possible for the farmerbot to expand with after wakeup calls it tries to reserve and deploy on the reported capacity?

I suppose it should be, though if that is implemented there needs to be some mechanism to inform the minting that this failed (which means pushing some data on the chain).

xmonader commented 11 months ago

I see, were there any actual reports from farmers regarding the zos concerning points? Because if so, some updates need to happen in zos, but if not, we can go a head indeed with these updates

@robvanmieghem we are waiting for your input to proceed with this

LeeSmet commented 11 months ago

Not to my knowledge, and there doesn't seem to be any open bugs about this on the zos repo

xmonader commented 11 months ago

Alright, should we have a forum post explaining the upcoming changes ?

LeeSmet commented 11 months ago

I suppose there needs to be a GEP about it, something for @sabrinasadik to handle when she's back next week

scottyeager commented 7 months ago

Reduce allowed downtime from power managed nodes from 25 hours to 24 hours (25 was an accidental leftover commit after testing purposes)

The extra hour here is currently saving us from fallout due to this bug in farmerbot that was introduced with the random wakeups. Since development of farmerbot has been stalled for some time, it's not clear when we'll be able to deploy a fix.

Lets please wait to make this one change in minting until the plan for next version of farmerbot can be carried out.

threefoldtech / minting_v3

Need stricter checks for minting #27