spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪
https://spacemesh.io
MIT License
745 stars 210 forks source link

NTP support #4

Closed avive closed 6 years ago

avive commented 6 years ago

Implement NTP client with time-drift detection (SNTP protocol). Kill node that is on a system with a wall clock too far apart from the ref time on startup.

lironz commented 6 years ago

i want this one, assign pls?

tal-m commented 6 years ago

Why do you care about the system time? Can't you just use the NTP time as the canonical time reference?

lironz commented 6 years ago

NTP result can vary, but also NTP diff from system time. once we set a minimal time diff threshold, we'll know which method is best.

lironz commented 6 years ago

i'm leaning towards tal's suggestion, with good averaged NTP pooling at node start

avive commented 6 years ago

Good question. I borrowed this idea from other blockchain and distributed store projects. The idea is to expect the sys time to be NTP-synced up to some delta and assume that the OS is using NTPD for that. Using only NTP time as the node ref time is also an option worth considering but the downsize is having a dependency on an NTP server returning time on startup. If for some reason the node can't get the ref time on startup (network hiccup, NTP server glitch, etc...) it can't start at all. I think it is best to require the node operator to ensure that his system clock is NTP synced using NTPD. We should also check how popular desktop OSs are configured - there's a good chance that they are NTP-synced by default.

avive commented 6 years ago

Please have a look at how go-ethereum deals with this issue. https://github.com/ethersphere/go-ethereum/blob/master/p2p/discv5/ntp.go

lironz commented 6 years ago

so NTP pool at start should be distributed over several servers, in case of NTP failiure, assuming connection issues, we can(?) run offline, maybe fixing the diff upon NTP connection restored.

lironz commented 6 years ago

go-ethereum pooling 3 times, and verifying that time drift between system and NTP is within a threshold range (+/-) using blocking behavior and notifying the user to change system settings.

avive commented 6 years ago

yes - as every outgoing p2p message is time-stamped with the sending time and nodes are going to reject incoming p2p messages that were sent too long ago in the past or in the future from their own ref time - it is very useful to be able to pick this timestamp from the system clock using time.Now() after verifying it is synced with ntp on startup. Not sure if there's any other viable alternative for such time-stamping without a synced sys time. We can't query NTP for the current time for each p2p message a node wants to send out to the network.

tal-m commented 6 years ago

We can query NTP to get the offset from the system time, and just use that. If the offset changes too erratically, that would be a reason to notify the user that their system clock is malfunctioning and abort.

avive commented 6 years ago

yes, but also if the sys clock is too far apart from ref ntp time then we should abort the node and ask users to sync their system with NTP. Not sure if the offset delta is important as long as it is within an acceptable range.

tal-m commented 6 years ago

This would make the system less robust. For example, some nodes have a system clock in GMT, while others have it in local time. If we just measure the offset from the NTP consensus and use that, we won't care -- but if we require the system time to match we'll end up aborting one class of nodes.

avive commented 6 years ago

@tal-m - I just don't understand how we can only use NTP and not the sys time when a node needs to timestamp a message without it having to issue a new NTP query (slow). If we check on node startup that the sys clock is in sync with ntp then using https://golang.org/pkg/time/#Now.Unix() for the unix epoch time regardless of the locally set time zone should be good. What am I missing?

avive commented 6 years ago

To clarify, I think we have 2 design options here:

Option 1

  1. Query NTP on node startup and bail out if sys clock is too far apart from the ref NTP time.
  2. Use system clock for time-stamping outgoing messages.
  3. Periodically check sys clock time drift from NTP while the node is running and kill node if the clock drifted to much away from system.

Option 2

  1. Query NTP each time node needs a timestamp. Tal - is this is a good summary or are you referring to a different option?
tal-m commented 6 years ago

No, I'm talking about the following option:

1 Query the NTP time at startup (or every once in a while). Compute and store the offset x = NTP-SystemTime.

  1. Whenever you need a timestamp, query the SystemTime and use SystemTime+x

Note that the NTP time isn't really what we care about -- we need nodes to have a consensus about the time, and don't really care if it's the "real" time or not. NTP is a good way to get this, but we should be careful that doesn't open us to a DoS attack via NTP (which isn't super secure, IIRC). So one thing we should do is compare our local NTP timestamp to the consensus time (i.e., ask your neighbors what their local time is) and warn the user if there's a difference.

avive commented 6 years ago

ok, this makes sense and can be part of the protocol to connect to 2-5 random online neighbors on node startup.

lironz commented 6 years ago

did a test, my OS NTP setting was off, got 11s diff from NTP server when NTP OS setting turned on, diff went down to 1-75ms, mostly around 11ms

while looking at go-ethereum code,, threshold is 10s

avive commented 6 years ago

~10s should be good enough for now until we have more specific consensus protocol p2p messages requirements.

avive commented 6 years ago

Just came across this NTP client in go blog post today - might be useful: https://medium.com/learning-the-go-programming-language/lets-make-an-ntp-client-in-go-287c4b9a969f - @lironz - are you starting with this task?

elewis787 commented 6 years ago

Are the timestamps being used for message ordering (reading up it seems like they are ) ? Would something like lamport clocks be more useful ? If I understand correctly even NTP can be unreliable and begin to drift.

More curious then anything ;)

avive commented 6 years ago

We are most likely not going to have higher-level p2p protocols that require message ordering. We are mostly interested in rejecting messages that are too far back in the past or future to prevent replay attacks. So, by having a roughly sync clock between all nodes up to a drift. The p2p protocol can safely reject messages sent too long ago or in the future.

y0sher commented 6 years ago

A basic ntp sync mechanism for now. https://github.com/spacemeshos/go-spacemesh/pull/89