Open bryce-b opened 3 years ago
(Clocks are hard. Except for Linux' clock_gettime(CLOCK_BOOTTIME)
which may not be available in the target runtime/language, I do not know any other clock implementation that goes in lockstep with the epoch time. Especially on client systems, the typical monotonic clocks stop when the CPU is suspended (e.g. with a closed notebook lid, but I imagine on battery-driven mobile devices it occurs even more). The realtime clock on the other hand is subject to be changed by the user on a whim.)
Without having delved deeper into the topic, I don't think it is feasible to get sub-second synchronization across distributed systems with anything short of full-fledged NTP (which takes a few minutes too sync precisely). For a precision in the order of a few seconds, it may be enough to send the "current" time with each request, so the receiver can calculate the offset between the current time of the sender and it's own current time.
it may be enough to send the "current" time with each request, so the receiver can calculate the offset between the current time of the sender and it's own current time.
This is more or less what we did in Plumbr
There is a blogpost, exposing conceptually how the clock skew was handled back in the days: https://plumbr.io/blog/monitoring/time-in-distributed-systems
In case this issue gets active once again, archive.org link for above blog post: https://web.archive.org/web/20210123103641/https://plumbr.io/blog/monitoring/time-in-distributed-systems
I'm creating this ticket per discussion in the OpenTelemetry maintainers' meeting 05/10/2021
Clock-skew will always be a problem with distributed tracing, but the degree of skew that occurs on unmanaged devices (by 'unmanaged' I mean devices outside of the software provider's control) is untenable.
This screenshot shows the degree of clock skew between a mobile device and a backend server while tracing a synchronous request. The mobile device is using an automatically sync'd system clock, but the degree of skew could be much, much worst, as the clock can be set at the whim of the mobile phone's owner (think days, months, years of skew).
I'd like to brainstorm some solutions to this problem. Some possible solutions could be: