jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.81k stars 719 forks source link

How to run test case with 6 machines - one control node and 5 db nodes? #115

Closed nhahtdh closed 8 years ago

nhahtdh commented 8 years ago

I set up 6 Ubuntu 14.04 VMs: 1 control node and 5 to-be-db-nodes. I have set up password-less SSH and make sure that known_hosts file stores the host names/IP address in plain instead of hash. The 6 VMs are behind firewall, so I have http_proxy and https_proxy environment variables set.

From my understanding, LXC set up is for running test on a single host, by spawning VMs on that host. I already have 6 separate VMs set up (equivalent to having 6 separate machines), so this step should not be necessary.

However, when I run lein test on aerospike (following the docs), I keep getting this error:

ERROR in (cas-register) (Util.java:349)
Uncaught exception, not in assertion.
expected: nil
  actual: com.jcraft.jsch.JSchException: java.net.UnknownHostException: n1
 at com.jcraft.jsch.Util.createSocket (Util.java:349)
    com.jcraft.jsch.Session.connect (Session.java:215)
    com.jcraft.jsch.Session.connect (Session.java:183)
    clj_ssh.ssh$eval5935$fn__5942.invoke (ssh.clj:118)
    clj_ssh.ssh.protocols$eval5861$fn__5884$G__5852__5893.invoke (protocols.clj:4)
    clj_ssh.ssh$connect.invoke (ssh.clj:401)
    jepsen.control$session.invoke (control.clj:197)
    clojure.lang.AFn.applyToHelper (AFn.java:154)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invoke (core.clj:624)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1862)
    clojure.lang.RestFn.applyTo (RestFn.java:142)
    clojure.core$apply.invoke (core.clj:628)
    clojure.core$bound_fn_STAR_$fn__4140.doInvoke (core.clj:1884)
    clojure.lang.RestFn.applyTo (RestFn.java:137)
    clojure.core$apply.invoke (core.clj:624)
    jepsen.core$fcatch$wrapper__7449.doInvoke (core.clj:53)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    clojure.core$pmap$fn__6328$fn__6329.invoke (core.clj:6466)
    clojure.core$binding_conveyor_fn$fn__4145.invoke (core.clj:1910)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:266)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)
Caused by: java.net.UnknownHostException: n1
 at java.net.AbstractPlainSocketImpl.connect (AbstractPlainSocketImpl.java:184)
    java.net.SocksSocketImpl.connect (SocksSocketImpl.java:392)
    java.net.Socket.connect (Socket.java:589)
    java.net.Socket.connect (Socket.java:538)
    java.net.Socket.<init> (Socket.java:434)
    java.net.Socket.<init> (Socket.java:211)
    com.jcraft.jsch.Util.createSocket (Util.java:343)
    com.jcraft.jsch.Session.connect (Session.java:215)
    com.jcraft.jsch.Session.connect (Session.java:183)
    clj_ssh.ssh$eval5935$fn__5942.invoke (ssh.clj:118)
    clj_ssh.ssh.protocols$eval5861$fn__5884$G__5852__5893.invoke (protocols.clj:4)
    clj_ssh.ssh$connect.invoke (ssh.clj:401)
    jepsen.control$session.invoke (control.clj:197)
    clojure.lang.AFn.applyToHelper (AFn.java:154)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invoke (core.clj:624)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1862)
    clojure.lang.RestFn.applyTo (RestFn.java:142)
    clojure.core$apply.invoke (core.clj:628)
    clojure.core$bound_fn_STAR_$fn__4140.doInvoke (core.clj:1884)
    clojure.lang.RestFn.applyTo (RestFn.java:137)
    clojure.core$apply.invoke (core.clj:624)
    jepsen.core$fcatch$wrapper__7449.doInvoke (core.clj:53)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    clojure.core$pmap$fn__6328$fn__6329.invoke (core.clj:6466)
    clojure.core$binding_conveyor_fn$fn__4145.invoke (core.clj:1910)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:266)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)

I have also tried changing hosts-map in jepsen/src/jepsen/control/net.clj, but to no avail.

What should I do to get Jepsen to run for this set up with separate machine?

aphyr commented 8 years ago

Either add hostfile entries mapping n1 to the first DB node, etc, or add a :nodes key to your test. See jepsen.core/run for docs.

nhahtdh commented 8 years ago

Thanks for your response.

I added n{1..5} to the /etc/hosts, and I got Auth fail error, even though I have set up password-less login for my account and even the root account on the db nodes.

In the end, I resorted to adding the list of nodes, and username and password to chronos.clj as per your second suggestion.

Then, a new error pops up:

lein test jepsen.chronos-test
INFO  jepsen.os.debian - 192.168.99.223 setting up debian
INFO  jepsen.os.debian - 192.168.99.224 setting up debian
INFO  jepsen.os.debian - 192.168.99.225 setting up debian
INFO  jepsen.os.debian - 192.168.99.222 setting up debian
INFO  jepsen.os.debian - 192.168.99.221 setting up debian
INFO  jepsen.os.debian - Installing #{sysvinit-core faketime unzip sysvinit}
INFO  jepsen.os.debian - Installing #{sysvinit-core faketime unzip sysvinit}
INFO  jepsen.os.debian - Installing #{sysvinit-core faketime unzip sysvinit}
INFO  jepsen.os.debian - Installing #{sysvinit-core faketime unzip sysvinit}
INFO  jepsen.os.debian - Installing #{sysvinit-core faketime unzip sysvinit}

lein test :only jepsen.chronos-test/install-test

ERROR in (install-test) (FutureTask.java:122)
expected: (:valid? (:results (run! (simple-test "0.28.1-2.0.20.ubuntu1204" "2.4.0-0.1.20151007110204.ubuntu1204"))))
  actual: java.util.concurrent.ExecutionException: java.lang.RuntimeException: [sudo] password for vsdev: E: Unable to locate package sysvinit-core
E: Package 'sysvinit' has no installation candidate

Reading package lists...
Building dependency tree...
Reading state information...
Package sysvinit is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  upstart:i386 sysvinit-utils:i386 upstart sysvinit-utils

I tried modifying os/debian.clj by commenting off the offending packages:

(c/su
        ; Packages!
        ; :sysvinit-core
        ; ;sysvinit
        (install [:wget
                  :sysvinit-utils
                  :curl
                  :vim
                  :man-db
                  :faketime
                  :unzip
                  :iptables
                  :psmisc
                  :iputils-ping
                  :rsyslog
                  :logrotate])

However, for some reason, the same error occurs. It seems that my change doesn't even get compiled. I tried lein clean, lein - U test, remove .m2 folder and .lein folder and reinstall lein, but none of them work. I even modified chronos.clj to include :reload-all, but no dice:

(ns jepsen.chronos
  "Sets up chronos"
  (:require [clojure.tools.logging :refer :all]
            [clojure.java.io :as io]
            [clojure.string :as str]
            [clojure.pprint :refer [pprint]]
            [clj-http.client :as http]
            [clj-time.core :as time]
            [clj-time.format :as time.format]
            [cheshire.core :as json]
            [jepsen [client :as client]
             [core :as jepsen]
             [db :as db]
             [tests :as tests]
             [control :as c :refer [|]]
             [checker :as checker]
             [nemesis :as nemesis]
             [generator :as gen]
             [util :refer [timeout meh]]
             [mesosphere :as mesosphere]]
            [jepsen.control.util :as cu]
            [jepsen.os.debian :as debian]
            [jepsen.chronos.checker :refer [checker epsilon-forgiveness]]
   :reload-all
  )
)

At this point, I'm not even sure if I should switch to debian and test on it instead...

aphyr commented 8 years ago

The tests are independent clojure projects and use lein dependency resolution to pull in the Jepsen library. See lein's docs on checkouts if you want to pull changes from disk instead. Or, you could just use Debian with the Debian tests.