Open danielmai opened 5 years ago
ntpdate
runs OK in each Jepsen node outside of the test:
$ docker exec jepsen-n1 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:08 ntpdate[6386]: step time server 45.79.187.10 offset -0.014471 sec
$ docker exec jepsen-n2 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:15 ntpdate[6365]: step time server 45.79.187.10 offset -0.000358 sec
$ docker exec jepsen-n3 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:22 ntpdate[6365]: step time server 45.79.187.10 offset -0.000179 sec
$ docker exec jepsen-n4 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:29 ntpdate[6360]: step time server 45.79.187.10 offset -0.000703 sec
$ docker exec jepsen-n5 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:36 ntpdate[6364]: step time server 45.79.187.10 offset 0.000188 sec
Actually, if these commands are run beforehand then the tests start running as expected.
That's a weird one. I don't know if this is a docker thing, but it looooks like whatever went wrong was some sort of issue with ntpdate: ntpdate[6305]: no server suitable for synchronization found
suggests that maybe the server it was trying to hit was unreachable or not provided or something?
I mean, ntpdate isn't going to do anything anyway, because you can't change clocks in docker containers, right? Either way, this is an odd one. You say it's working now?
It works only when ntpdate
is run on all the nodes before running lein run test
. That’s the workaround we’re using now.
Otherwise, ntpdate
reliably complains about no suitable synchronization server.
because you can't change clocks in docker containers, right?
Err... given that, does that mean skew clock tests don’t work with the Docker setup?
It works only when ntpdate is run on all the nodes before running lein run test. That’s the workaround we’re using now.
That is really strange! I'm not sure why ntpdate would say "no server suitable" when run by Jepsen, but be fine running it by SSH, and then be fine when Jepsen runs it afterwards.
Err... given that, does that mean skew clock tests don’t work with the Docker setup?
Yeah, I assume so. It might be different on your platform, but at least on Debian's LXC, clocks aren't namespaced and can't be updated in containers.
I'm trying to set up Jepsen testing in Dgraph's TeamCity CI infrastructure. The tests run OK when I spin up a new CI agent machine (Ubuntu machines on Google Cloud), SSH into the box and run tests in an interactive shell. i.e., after setting up the Jepsen cluster via ./up.sh I can run tests with this command.
But when this gets run via a triggered CI build, they always fail with NTP clock synchronization issues.