tarantool / jepsen.tarantool

Jepsen tests for Tarantool
https://www.tarantool.io/en/
Other
7 stars 0 forks source link

Catch exceptions in operations #57

Open ligurio opened 4 years ago

ligurio commented 4 years ago

When tarantool is not available or something goes wrong exceptions like below spams log:

2020-10-21 09:56:36,372{GMT}    INFO    [jepsen worker 2] jepsen.util: 2        :invoke :add    1
2020-10-21 09:56:36,373{GMT}    INFO    [jepsen worker 3] jepsen.util: 3        :invoke :add    1
2020-10-21 09:56:36,401{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-partition        nil
2020-10-21 09:56:36,419{GMT}    WARN    [jepsen nemesis] jepsen.core: Process :nemesis crashed
java.lang.IllegalArgumentException: Expected op {:type :info, :f :start, :value nil, :process :nemesis, :time 27663624311} to have a grudge for a 
:value, but none given.
        at jepsen.nemesis$partitioner$reify__5651.invoke_BANG_(nemesis.clj:135) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis.combined$partition_nemesis$reify__2377.invoke_BANG_(combined.clj:195) ~[na:na]
        at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:46) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:42) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$compose$reify__5683.invoke_BANG_(nemesis.clj:259) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:46) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:42) ~[jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_$fn__5888.invoke(core.clj:259) ~[jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_.invokeStatic(core.clj:259) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_.invoke(core.clj:254) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_apply_op_BANG_.invokeStatic(core.clj:294) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_apply_op_BANG_.invoke(core.clj:286) [jepsen-0.1.18.jar:na]
        at jepsen.core.NemesisWorker.run_worker_BANG_(core.clj:410) [jepsen-0.1.18.jar:na]
        at jepsen.core$run_workers_BANG_$run__5872.invoke(core.clj:206) [jepsen-0.1.18.jar:na]
        at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) [jepsen-0.1.18.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) [clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) [clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:425) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:156) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:132) [clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) [clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.10.1.jar:na]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_262]
2020-10-21 09:56:36,419{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-partition        nil     indeterminate: Expected op {:type :info, :f :start, :value nil, :process :nemesis, :time 27663624311} to have a grudge for a :value, but none given.
2020-10-21 09:56:36,487{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :ok     :add    1

It would be nice to catch all possible exceptions and convert them to history :info records with a reason of exception.

       (invoke! [this test op]
         (try
           (let [id     (key (:value op))
                 value  (val (:value op))]
             (case (:f op)
               :read   (let [proc (if (:strong-reads? opts)
                                    "SRegisterStrongRead"
                                    "REGISTERS.select")
                             v (-> conn
                                   (voltdb/call! proc id)
                                   first
                                   :rows
                                   first
                                   :VALUE)]
                         (assoc op
                                :type :ok
                                :value (independent/tuple id v)))
               :write (do (voltdb/call! conn "REGISTERS.upsert" id value)
                          (assoc op :type :ok))
               :cas   (let [[v v'] value
                            res (-> conn
                                    (voltdb/call! "registers_cas" v' id v)
                                    first
                                    :rows
                                    first
                                    :modified_tuples)]
                        (assert (#{0 1} res))
                        (assoc op :type (if (zero? res) :fail :ok)))))
           (catch org.voltdb.client.NoConnectionsException e
             (assoc op :type :fail, :error :no-conns))
           (catch org.voltdb.client.ProcCallException e
             (let [type (if (= :read (:f op)) :fail :info)]
               (condp re-find (.getMessage e)
                 #"^No response received in the allotted time"
                 (assoc op :type type, :error :timeout)

                 #"^Connection to database host .+ was lost before a response"
                 (assoc op :type type, :error :conn-lost)

                 #"^Transaction dropped due to change in mastership"
                 (assoc op :type type, :error :mastership-change)

                 (throw e))))))

Like this - https://github.com/jepsen-io/voltdb/blob/2799ede72fd06ef0dd7879bb52d94e6fffe9b71d/src/jepsen/voltdb/single_register.clj#L61