Open maltefiala opened 7 years ago
Could you elaborate the negative impact to linux from clock_settime syscall?
COSBench maintains heartbeats between controller and drivers to keep timestamps in sync with minimal deviation, this is why clock_settime is called (through "date -s" command).
Historically, we see a few new users using cosbench with controller and driver out of sync (a lot), which sees a few weird reported results. This is why we include time sync logic inside.
Well i can elaborate on that.
We set up a small Ceph Cluster to evaluate Ceph.
This Cluster was synchronized to a single Stratum 3 NTP server. Resulting in a ΔT < 1mS of the systemclocks.
During benchmarks ΔT would grow to several Seconds. Resulting in Problems for CEPH resulting in horrible benchmark Results.
We switched off NTP to verify if there is a problem with ntp. So now after running the Benchmarks we could observe a ΔT > 15s. So in this Case it did the opposite of what you intended.
Solution: People setting up storage clusters and clients for them need to keep time synchronus across all hosts anyway. So the benchmarking Tool trying to do it for them (and suceeding) would result in a situation where their cluster would seem to work during benchmarks. And then fail to work when synchronization is lost later on. So i would recommend printing a warning: "Times are not synchronized. You need to fix this. Look into NTP...". Instead of trying to magically mess with the systemtime.
Just so you know how unexpected behavior is: When we ran into the desynchronizing clocks problem. We at first suspected time problems related to TSC, CPU Frequency , Powermanagement, Tickless, (and some exotic CPU bugs) before we suspected the Cosbench. And none of us is entirely new to running Servers.
this commit 893c955e880d3d96a4b561ec2f1b58fab3953e20 can help
@tianshan How does this help? A benchmark tool should report how the system performs as it is currently configured. If the time of servers is out of sync and my benchmark tool fixes that, this rule is broken. Consequently, I will assume that my system was configured correctly, which is a terrible conclusion to draw.
On the other hand, if my system is configured correctly (which it was in our case), I want under no circumstances that my benchmark tool screw this up by assuming it can do better than ntp. Apart from causing much confusion (like in our case) this can lead to things breaking in a working setup.
Automatically changing the system time is never the right thing to do. The only solution to the problem is removing the code that changes the system time, or to make it optional, and (importantly) turned off by default.
oh my fault, d4e708879b8ccdc18f7da809cb6f421cd51639b7 is the correct one, add an option switch off the set time.
It is still turned on by default.
This forcable setting of time is causing errors in my environment. I see the added check for getsystemproperty but I do not understand how to set it to non-default behaviour. I just left this comment on the commit... Would you please explain how the setting can be changed by a user? In which cfg file, and with what syntax?
why is NTP such a bad idea? If you use an NTP server on the same site you should get very low offset and jitter (i.e. under 10 msec) (measure with "ntpq -p"). If you don't have an NTP server on the same site, set one up - it's not hard to do. Ceph requires that you have tight time synchronization anyway on the monitors, so you'd have to do this anyway if you use Ceph. So then why should COSbench be setting the time? Seems to me like the default should be for COSBench not to set the time.
cosbench calls
date -s
, effectively changing the system clock, making linux VERY UNHAPPY.Environment
Steps to reproduce
Submit some test
Kernel patch to debug clock_settime syscall by @friedrich
Kernel log:
man date
Possible root of this issue
date -s
is called in dev/cosbench-driver-web/src/com/intel/cosbench/driver/handler/PingHandler.java