pendulum-project / ntpd-rs

A full-featured implementation of the Network Time Protocol, including NTS support.
https://trifectatech.org
Other
761 stars 43 forks source link

Inaccurate log message for missbehaving server #1227

Closed tdittr closed 3 months ago

tdittr commented 11 months ago

When a server responds with a NTPv3 response ntpd-rs logs the following:

2023-11-24T08:52:05.258467Z  INFO ntp_proto::peer: Peer rejected due to invalid stratum. The stratum of a peer must be lower than the own stratum peer_stratum=16 own_stratum=16

Which is not helpful in detecting the problem. It is not mentioning that the server is misbehaving, nor which server it is about.

To reproduce this issue use time.windows.com as a source and you can observe the following in wireshark:

Request:

Network Time Protocol (NTP Version 4, client)
    Flags: 0x23, Leap Indicator: no warning, Version number: NTP Version 4, Mode: client
        00.. .... = Leap Indicator: no warning (0)
        ..10 0... = Version number: NTP Version 4 (4)
        .... .011 = Mode: client (3)
    [Response In: 3189]
    Peer Clock Stratum: unspecified or invalid (0)
    Peer Polling Interval: 4 (16 seconds)
    Peer Clock Precision: 1,000000 seconds
    Root Delay: 0,000000 seconds
    Root Dispersion: 0,000000 seconds
    Reference ID: NULL
    Reference Timestamp: NULL
    Origin Timestamp: NULL
    Receive Timestamp: NULL
    Transmit Timestamp: May  3, 2050 06:18:23.950905785 UTC

Response:

Network Time Protocol (NTP Version 3, server)
    Flags: 0x1c, Leap Indicator: no warning, Version number: NTP Version 3, Mode: server
        00.. .... = Leap Indicator: no warning (0)
        ..01 1... = Version number: NTP Version 3 (3)
        .... .100 = Mode: server (4)
    [Request In: 3186]
    [Delta Time: 0.007771932 seconds]
    Peer Clock Stratum: secondary reference (3)
    Peer Polling Interval: 4 (16 seconds)
    Peer Clock Precision: 0,000000 seconds
    Root Delay: 0,016159 seconds
    Root Dispersion: 0,029572 seconds
    Reference ID: 25.66.230.1
    Reference Timestamp: Nov 24, 2023 08:41:50.727581599 UTC
    Origin Timestamp: May  3, 2050 06:18:23.950905785 UTC
    Receive Timestamp: Nov 24, 2023 08:51:32.899579199 UTC
    Transmit Timestamp: Nov 24, 2023 08:51:32.899582599 UTC

Log:

2023-11-24T08:51:32.886304Z  WARN ntpd::daemon::config: Fewer sources configured than are required to agree on the current time. Daemon will not change system time.
2023-11-24T08:51:32.891833Z  INFO ntpd::daemon::system: new peer source_id=PeerId(1) addr=40.119.148.38:123 spawner=SpawnerId(1)
2023-11-24T08:52:05.258467Z  INFO ntp_proto::peer: Peer rejected due to invalid stratum. The stratum of a peer must be lower than the own stratum peer_stratum=16 own_stratum=16
2023-11-24T08:53:10.466259Z  INFO ntp_proto::peer: Peer rejected due to invalid stratum. The stratum of a peer must be lower than the own stratum peer_stratum=16 own_stratum=16
2023-11-24T08:55:20.462268Z  WARN spawn{index=PeerId(1) addr=40.119.148.38:123 interface=None enable_timestamps=EnableTimestamps { rx_software: true, tx_software: false, rx_hardware: false, tx_hardware: false } network_wait_period=1s protocol_version=V4 nts=None}: ntpd::daemon::peer: Peer is no longer reachable over network, restarting
2023-11-24T08:55:20.469684Z  INFO ntpd::daemon::system: new peer source_id=PeerId(2) addr=40.119.148.38:123 spawner=SpawnerId(1)
2023-11-24T08:55:52.893977Z  INFO ntp_proto::peer: Peer rejected due to invalid stratum. The stratum of a peer must be lower than the own stratum peer_stratum=16 own_stratum=16

Config:

[observability]
# Other values include trace, debug, warn and error
log-level = "info"
observation-path = "/var/run/ntpd-rs/observe"

[[source]]
mode = "server"
address = "time.windows.com"

# Below are configured various thresholds beyond which ntpd-rs will not
# change the system clock. CHANGE THESE TO MATCH YOUR SECURITY NEEDS!
# For guidance, see OPERATIONAL_CONSIDERATIONS.md
[synchronization]
single-step-panic-threshold = 1800
startup-step-panic-threshold = { forward="inf", backward = 1800 }
#accumulated-step-panic-threshold = 1800
#minimum-agreeing-sources = 3
ZedCode commented 4 months ago

I also ran into this exact same error message under a very different configuration. In my configuration I have both a client and a server (ntp_client:10.0.0.1, I have ntp_server:10.0.0.2,10.0.0.12 -- the ntp_server has two assigned IP addresses). When I set my server up so that it's listening on 0.0.0.0:123 then clients will return the error:

ntp_proto::peer: Peer rejected due to invalid stratum. The stratum of a peer must be lower than the own stratum peer_stratum=16 own_stratum=16

The stratum of the server shows as 3 in the output of ntp-ctl and the client shows 4. Finding this issue I became suspect that this behavior has nothing to do with the stratum and a tcpdump confirmed it. It basically does this:

10.0.0.1 -> 10.0.0.2   # ntp request to server
10.0.0.1 <- 10.0.0.12  # ntp response from server from wrong IP
10.0.0.1 -> 10.0.0.12  # the ntp client sends an ICMP packet to the other IP which gets no response
                       # this causes the error above to be written to logs               

Binding the service to 10.0.0.2:123 prevented this behavior and solved the issue. I'm just posting this information here because I think there must be a few weird things in the code path that ultimately leads to this log line being written.

rnijveld commented 4 months ago

@ZedCode Interesting how the return message is going through the other IP address, I don't think we've really tested ntpd-rs with such a network topology. I take it your server is also an ntpd-rs instance in this example? I could try and replicate such a setup on my own device. I have one question: on the server, was there any specific routing set up for the two different IPs?

ZedCode commented 4 months ago

Hi @rnijveld , I took another look through the configuration this morning when I was trying to come up with an easy set of steps to reproduce. When doing that I found that I had mixed up the primary IP address and the secondary IP address. To be clear, I don't think there's an issue here except a logging bug, but if you want to see the incorrect logging message you should be able to do the following:

The client should do what I observed above, including writing the log line I saw.

With the added caveat that that this was done by using the secondary, aliased IP address, it seems incredibly unlikely someone would accidentally find themself in this same situation. Thank you for taking a look at the issue, let me know if you try to replicate and aren't able to and I'll try to come up with a test case using something a bit more portable like vagrant if you think it's worth tracking down!

rnijveld commented 4 months ago

What happened here was that time.windows.com responds with v3 messages (in response to our v4 messages), which we ignored, without showing any message in the log. I've updated our behavior in #1507: we now should accept v3 responses to our v4 messages (by hoping that the server did the right thing and did not misinterpret our v4 messages). Additionally we should now also add a warning to the log whenever an invalid packet version was returned to us.

The stratum messages are only because every source gets some default information as long as no measurements took place. Maybe we should try and reduce our spam here: as long as a source doesn't have any measurement, it maybe should have a special status that allows the algorithm to ignore it.