schweikert / fping

High performance ping tool
https://fping.org
Other
1.01k stars 250 forks source link

Use CLOCK_REALTIME when logging and CLOCK_MONOTONIC when timing #290

Closed cagney closed 2 months ago

cagney commented 8 months ago

Change update_current_time() so that current_time is set from CLOCK_REALTIME and current_time_ms is set from CLOCK_MONOTONIC. This way log messages display wall clock time while any timing calculations use monotonic time.

See #203

In seqmap_add(), before checking to see if .time_ts wrapped, check that .time_ts was set.

See #288

schweikert commented 8 months ago

This looks good, thank you! Could you have a look at the failing test and check whether the failure is caused by this change?

cagney commented 8 months ago

looking

gsnw commented 8 months ago

The test is broken by fping.c line 1500 current_time_ns = timespec_ns(&monotonic_time); When I turn it back to CLOCK_REALTIME current_time_ns = timespec_ns(&current_time);, the test work

I hope this helps with the analysis

cagney commented 8 months ago

Off hand, would you know which clock:

    /* ancilliary data */
    {
        struct timespec reply_timestamp_ts;
        for (cmsg = CMSG_FIRSTHDR(&recv_msghdr);
             cmsg != NULL;
             cmsg = CMSG_NXTHDR(&recv_msghdr, cmsg)) {
            if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_TIMESTAMPNS) {
                memcpy(&reply_timestamp_ts, CMSG_DATA(cmsg), sizeof(reply_timestamp_ts));
                *reply_timestamp = timespec_ns(&reply_timestamp_ts);
            }
        }
    }

the Linux kernel uses for this?

cagney commented 8 months ago

I'm pretty sure it is CLOCK_REALTIME, which, on linux, leads to:

$ ./src/fping -z 0xffff 127.0.0.1
./src/fping:
  count: 1, retry: 3, interval: 10 ms
  perhost_interval: 1000 ms, timeout: 500
  ping_data_size = 56, trials = 4
  verbose_flag set
  backoff_flag set
  report_all_rtts_flag set
  randomly_lose_flag set
  print_per_system_flag set
[83452.72502] 127.0.0.1 [0]: add ping event in 0 ms
[83452.72502] # main_loop
[83452.72502] 127.0.0.1 [0]: ping event
[83452.72505] seqmap_add(host: 0, index: 0) -> 0
[83452.72505] 127.0.0.1 [0]: send ping
[83452.72505] 127.0.0.1 [0]: add timeout event in 500 ms
[83452.72505] next timeout event in 500 ms (127.0.0.1)
[83452.72505] waiting up to 500 ms
[83452.72822] seqmap_fetch(0) -> host: 0, index: 0
[83452.72822] received [0] from 127.0.0.1 recv_time=1703084424276482893 (curr_time=83452728221621) this_reply=1703000971551429542 timeout=500000000
[83452.72822] discard [0] from 127.0.0.1 as this_reply=1703000971551429542 exceeds timeout=500000000

it's looking like a better option is to just and only use CLOCK_REALTIME and live with delta changes.

gsnw commented 8 months ago

Unconfirmed and unknowingly, I would say yes. socket.c seems to use ktime_get_real() and that again refers to CLOCK_REALTIME

https://www.kernel.org/doc/html/latest/core-api/timekeeping.html https://github.com/torvalds/linux/blob/master/net/core/sock.c

gsnw commented 8 months ago

Your patch seems to work on FreeBSD. If we limit this to FreeBSD, OpenBSD and macOS as before, we have a solution. This is also shown by the macOS test, which is the only one to be successful.

My example with your change b00622cf063b576553582e388be1c60db4b9d758

cagney commented 8 months ago

My thought, having worked through this, is to reject this patch and instead simplify the conditional code so it always uses CLOCK_REALTIME. The real cause of the FreeBSD problems is their out-of-date package. I'll poke them.

The ideal clock source is probably CLOCK_BOOTTIME but that's linux only and won't work with timestamp.

gsnw commented 8 months ago

I have found a solution that fixes your problem with the test on Linux cceca003f608ebf0007713e0db88b714055df837 Please update your pull-request

curl -sL https://github.com/gsnw/fping/commit/cceca003f608ebf0007713e0db88b714055df837.patch | git am

Unfortunately creates the bug again #272

rowingdude commented 4 months ago

Is there still ambition of improving this branch? I am looking for a project.

gsnw commented 3 months ago

@rowingdude you are free to look for a final solution. Currently, only a part without CLOCK_MONOTONIC is used as the primary time specification. You can see the reason for this in the previous posts.

Otherwise I would close the pull request at the end of the month.

gsnw commented 2 months ago

Closed because the pull request is not functional. There is also no corrected version. Parts of the pull-request have been transferred via a new pull-request.