No distributed system can assume clocks are synchronized

spacemeshos / protocol

This repo contains the Spacemesh protocol specifications and related documentation

http://protocol.spacemesh.io/

Apache License 2.0

37 stars 13 forks source link

No distributed system can assume clocks are synchronized #49

Closed peterbourgon closed 1 year ago

peterbourgon commented 2 years ago

From the protocol paper

Our model assumes honest parties have synchronized clocks.

Clock synchronization among nodes in a distributed system is, as far as I understand, literally impossible — edit: if the system should be available and/or consistent. Am I mistaken? How can this invariant be confirmed?

lrettig commented 2 years ago

Thanks for the great question!

That quote is for simplification, it's absorbed into the network latency assumption. If honest miners send proposals for layer j at 4:20:00pm then preround of hare expects to receive the honest proposals within 30 seconds by 4:20:30pm, then two honest miners with local clock difference of say 1 second will still be ok if the actual maximum network latency is 29 seconds.

peterbourgon commented 2 years ago

The issue is if honest participants have divergent system clocks, or exist behind highly-latent network connections, not by deliberate malicious action but by accident. Honest miners can have local clock differences of infinite seconds. Does the protocol work in that circumstance?

dshulyak commented 2 years ago

Miners are expected to run NTP daemon connected to one of the public pools. There are also other more expensive options, but i don't have a link for them atm. In such case differences will be bounded. Clock deviates ~1s per week, latency is also bounded.

Protocol operates in relatively large time windows (e.g. 10s-30s), so short deviation won't make any difference for protocol correctness or liveness. If we will consider edge case when every miner has a different clock - network won't make any progress.

peterbourgon commented 2 years ago

Miners are expected to run NTP daemon connected to one of the public pools. There are also other more expensive options, but i don't have a link for them atm. In such case differences will be bounded.

The protocol is free to treat this as an assumption, but it can't treat it as invariant — miner clocks can be arbitrarily incorrect. But if miners with broken clocks will cause network to halt, and that's an acceptable outcome, then all good 👍

lrettig commented 1 year ago

If a minority of miners have (very) broken clocks, this is not an issue for the protocol. It's subsumed into the honest majority assumption that the network operates under - and indeed that all Byzantine fault tolerant networks operate under. In other words, a miner with a broken clock would be treated as dishonest by the protocol.

It's also worth noting that the Spacemesh subprotocols have different synchronicity models. Hare operates under partial synchrony as @dshulyak noted above. Messages have to be received within ~30 secs, and if this assumption fails for a majority of miners, then Hare fails, and for some period of time we'd confirm empty layers until Hare starts working again.

Tortoise, by contrast, works in fully async mode.

@peterbourgon really appreciate the questions! Closing this as I assume nothing else is pending here but feel free to reopen if I've missed anything.