Closed peterbourgon closed 1 year ago
Thanks for the great question!
That quote is for simplification, it's absorbed into the network latency assumption. If honest miners send proposals for layer j
at 4:20:00pm then preround of hare expects to receive the honest proposals within 30 seconds by 4:20:30pm, then two honest miners with local clock difference of say 1 second will still be ok if the actual maximum network latency is 29 seconds.
The issue is if honest participants have divergent system clocks, or exist behind highly-latent network connections, not by deliberate malicious action but by accident. Honest miners can have local clock differences of infinite seconds. Does the protocol work in that circumstance?
Miners are expected to run NTP daemon connected to one of the public pools. There are also other more expensive options, but i don't have a link for them atm. In such case differences will be bounded. Clock deviates ~1s per week, latency is also bounded.
Protocol operates in relatively large time windows (e.g. 10s-30s), so short deviation won't make any difference for protocol correctness or liveness. If we will consider edge case when every miner has a different clock - network won't make any progress.
Miners are expected to run NTP daemon connected to one of the public pools. There are also other more expensive options, but i don't have a link for them atm. In such case differences will be bounded.
The protocol is free to treat this as an assumption, but it can't treat it as invariant — miner clocks can be arbitrarily incorrect. But if miners with broken clocks will cause network to halt, and that's an acceptable outcome, then all good 👍
If a minority of miners have (very) broken clocks, this is not an issue for the protocol. It's subsumed into the honest majority assumption that the network operates under - and indeed that all Byzantine fault tolerant networks operate under. In other words, a miner with a broken clock would be treated as dishonest by the protocol.
It's also worth noting that the Spacemesh subprotocols have different synchronicity models. Hare operates under partial synchrony as @dshulyak noted above. Messages have to be received within ~30 secs, and if this assumption fails for a majority of miners, then Hare fails, and for some period of time we'd confirm empty layers until Hare starts working again.
Tortoise, by contrast, works in fully async mode.
@peterbourgon really appreciate the questions! Closing this as I assume nothing else is pending here but feel free to reopen if I've missed anything.
From the protocol paper
Clock synchronization among nodes in a distributed system is, as far as I understand, literally impossible — edit: if the system should be available and/or consistent. Am I mistaken? How can this invariant be confirmed?