Testing kernel time management from userspace

teto commented 8 years ago

hi,

I am interested in testing the ntp protocol and real implementations such as chrony (http://chrony.tuxfamily.org) or ntpd (www.ntp.org) in a simulated environement, more precisely in the ns3 simulator (www.nsnam.org). ns3 makes it possible to run real code thanks to its Direct Code Execution (DCE) extension (https://github.com/direct-code-execution/ns-3-dce).

This issue is a way to assess my understanding as well as checking for possible solutions of what I want to achieve. Hence it's likely Ive misunderstood some parts, hope you will correct me.

So far I found a few testing projects:

https://github.com/johnstultz-work/timetests by @johnstultz-work executes syscalls on a running kernel with various parameters trying to see if syscalls respond accordingly.
https://github.com/mlichvar/linux-tktest by @mlichvar implements some kernel functions such as printk and link kernel/time/timekeeping.c and sounds similar to the previous project ?
https://github.com/mlichvar/clknetsim is a simulator that start a few ntp servers with LD_PRELOAD trick, it generate ticks and propagates them to the fake nodes, it also delays messages in order to simulate network propagation delays. The downside is that it reimplements adjtimex etc... instead of testing the kernel's.

Now there seems to be 2 modes for clocks:

tick based struct clocksource defined in include/linux/clocksource.h
tickless with struct clock_event_device http://lxr.free-electrons.com/source/include/linux/clockchips.h#L99

The scenario I would like to achieve is:

setup a topology/links in ns3
load some ntp servers (chrony/ntpd) in the ns3 simulation with DCE
these servers should call kernel adjtimex (thanks to lkl or libOS ?)
changes made by the kernel need to be reflected in ns3

ns3 is a discrete event simulator, ie. it is tickless: if you schedule an event at t0=5sec and a second one at t1=100s, then ns3 will just execute the first event, set the integer representing time to t1=100 and execute the event. So far in ns3, the nodes share the time value and are all perfectly synchronized, I want to introduce per node clocks.

My problem is step 4:

when ns3 advances time from t0 to t1, how can it update the kernel value ? There is no notion of tick it justs jumps. Should I write some driver struct clock_event_device ?
and the opposite, if the kernel needs to inject an offset, should it have an impact on the ns3 node clock ?

Sorry if it sounds confused or for mentioning you if you are not interested. I think this LKL project or libOS can help in what I want to achieve. I am willing to contribute to LKL if that helps but I would need directions.

Matt

johnstultz-work commented 8 years ago

On Wed, Jan 20, 2016 at 10:09 AM, Matthieu Coudron notifications@github.com wrote:

hi,

I am interested in testing the ntp protocol and real implementations such as chrony (http://chrony.tuxfamily.org) or ntpd (www.ntp.org) in a simulated environement, more precisely in the ns3 simulator (www.nsnam.org). ns3 makes it possible to run real code thanks to its Direct Code Execution (DCE) extension (https://github.com/direct-code-execution/ns-3-dce).

This issue is a way to assess my understanding as well as checking for possible solutions of what I want to achieve. Hence it's likely Ive misunderstood some parts, hope you will correct me.

So far I found a few testing projects:

https://github.com/johnstultz-work/timetests by @johnstultz-work executes syscalls on a running kernel with various parameters trying to see if syscalls respond accordingly.

Note that the tests above have been merged w/ the Linux kernel's source in the tools/testing/selftests/timers/ directory.

https://github.com/mlichvar/linux-tktest by @mlichvar implements some kernel functions such as printk and link kernel/time/timekeeping.c and sounds similar to the previous project ?

So this basically runs the kernel's timekeeping logic in userspace, along with a harness to drive the logic and virtual clocksources used.

(Which is something I'd still like to see included in the kernel's kselftest suite :)

https://github.com/mlichvar/clknetsim is a simulator that start a few ntp servers with LD_PRELOAD trick, it generate ticks and propagates them to the fake nodes, it also delays messages in order to simulate network propagation delays. The downside is that it reimplements adjtimex etc... instead of testing the kernel's.

Now there seems to be 2 modes for clocks:

tick based struct clocksource defined in include/linux/clocksource.h tickless with struct clock_event_device http://lxr.free-electrons.com/source/include/linux/clockchips.h#L99

The scenario I would like to achieve is:

setup a topology/links in ns3

load some ntp servers (chrony/ntpd) in the ns3 simulation with DCE

these servers should call kernel adjtimex (thanks to lkl or libOS ?)

changes made by the kernel need to be reflected in ns3

ns3 is a discrete event simulator, ie. it is tickless: if you schedule an event at t0=5sec and a second one at t1=100s, then ns3 will just execute the first event, set the integer representing time to t1=100 and execute the event. So far in ns3, the nodes share the time value and are all perfectly synchronized, I want to introduce per node clocks.

My problem is step 4:

when ns3 advances time from t0 to t1, how can it update the kernel value ? There is no notion of tick it justs jumps. Should I write some driver struct clock_event_device ?

So see Miroslav's tktest. It includes logic to provide a virtual clocksource that is controlled by the simulator which allows the kernel interfaces to be tested under different conditions.

and the opposite, if the kernel needs to inject an offset, should it have an impact on the ns3 node clock ?

Sorry if it sounds confused or for mentioning you if you are not interested. I think this LKL project or libOS can help in what I want to achieve. I am willing to contribute to LKL if that helps but I would need directions.

I'm not familiar with LKL or libOS, but it seems like being able to wire in the ntp/chrony userspace code into Miroslav's simulator would be quite interesting.

thanks -john

thehajime commented 8 years ago

@teto, although I have a plan to support ns-3 DCE with lkl, libos is not equal with lkl so far: there is no way to play ns-3 with lkl at the moment. LibOS supports this since the initial version (actually libos is the port of ns-3 DCE linux support, btw).

a rough idea to accomplish this would be:

to replace/bridge libos API to lkl host calls
synchronize two clocks between ns-3 and linux(lkl)
- libos synchronizes the ns-3 clock with jiffies, which may not be applicable with lkl (I'm not well investigated this yet..). see lib_update_jiffies() around https://github.com/libos-nuse/net-next-nuse/blob/master/arch/lib/time.c#L31
wrap system calls (e.g., adjtimex(2)) on ns-3 (DCE) side
replace libos's stub code of timer API (e.g., timer_add() etc) in linux way (tickless maybe)

mlichvar commented 8 years ago

ns3 is a discrete event simulator, ie. it is tickless: if you schedule an event at t0=5sec and a second one at t1=100s, then ns3 will just execute the first event, set the integer representing time to t1=100 and execute the event.

The trouble with this is that when you are simulating real clocks, you don't know when an event will actually happen. The clock randomly speeds up and slows down, in a simple model it can be Gaussian random walk in the frequency. I'm not familiar with ns-3 or DCE, so I'm not sure how that fits in the their design. I suspect for each kernel tick there will have to be a separate event.

teto commented 8 years ago

I could add an event per tick but then it would not really align with the ns3 spirit and I suppose would prove very slow. The idea is more to update parameters only when needed. For instance when setting a singleshot offset, ns3 can translate when the node clock singleshotoffset should end in simulator time (the global reference). If the singleshotoffset is changed inbetween, rollback changes (i.e. cancel the future event), update clock and add a new event.

For now I am just looking at simple clocks (like perfect frequency, this client clock is 1,1 faster than its server with an atomic clock for instance): my interest is more on how propagation delay influence NTP updates rather than how the kernel succeeds in modeling clock glitches. It looks like all the crucial data is held in kernel's "struct timekeeper *tk", I am looking at this and the chrony loop.

mlichvar commented 8 years ago

For now I am just looking at simple clocks (like perfect frequency, this client clock is 1,1 faster than its server with an atomic clock for instance): my interest is more on how propagation delay influence NTP updates rather than how the kernel succeeds in modeling clock glitches.

The instability of the clock does have an influence on NTP. With a perfect clock the job of an NTP client degrades to finding the right frequency offset and it would behave differently than with a typical computer clock, e.g. the polling interval would stay at the maximum. I know you want to get something working quickly, but please don't rule out the possibility of using a more realistic model of the clock.

teto commented 8 years ago

I fully agree. Once I get the basic case running, modeling more realistic clocks would be a lot easier; to add some noise and/or random change in frequency at random intervals. For now I struggle with the basic case :)

lkl / linux

Testing kernel time management from userspace #40