sunng87 / diehard

Clojure resilience library for flexible retry, circuit breaker and rate limiter
Eclipse Public License 2.0
330 stars 27 forks source link

Incompatible change in behaviour of fallback policy, after upgrading to failsafe 2.0.x #21

Closed iarenaza closed 5 years ago

iarenaza commented 5 years ago

Commit https://github.com/sunng87/diehard/commit/fc490041b0df96db3660957449e748d9e32d44a1 introduced the changes needed to implement the fallback policy as a regular policy.

Previous to this change, the fallback policy was only executed after the retry policy (assuming no circuit breaker was configured) had resulted in a failed block execution for all configured retries. With the new change, the fallback policy is executed for every failed block execution.

This is due to a bad policy composition introduced in that commit. The typical way of composing the fallback policy (and the one that mimics the previous behaviour) is specified in https://github.com/jhalterman/failsafe#policy-composition.

So the right order to implement that typical policy and maintain the previous behaviour is by applying the following change:

diff --git a/src/diehard/core.clj b/src/diehard/core.clj
index f591542..410b279 100644
--- a/src/diehard/core.clj
+++ b/src/diehard/core.clj
@@ -343,7 +343,7 @@ It will work together with retry policy as quit criteria.
            fallback# (fallback the-opt#)
            cb# (:circuit-breaker the-opt#)

-           policies# (into-array FailurePolicy (filter some? [retry-policy# fallback# cb#]))
+           policies# (into-array FailurePolicy (filter some? [fallback# retry-policy# cb#]))

            failsafe# (Failsafe/with policies#)
            failsafe# (if-let [on-complete# (:on-complete the-opt#)]
sunng87 commented 5 years ago

Thanks to reporting. I will create a test case to reproduce and confirm.

sunng87 commented 5 years ago

hi @iarenaza , I'm not sure if I understand correctly. According to your description, this added case should fail but it passed:

https://github.com/sunng87/diehard/commit/d9524ce9dd40c2ab717a78f384498b1950f9ee60

iarenaza commented 5 years ago

I probably expressed myself wrong, but maybe an example might be more clear. Try this in both diehard 0.7.2 and 0.8.x:

(require '[diehard.core :refer :all])
(let [counter (atom 0)]
   (with-retry {:fallback (fn [& _] (swap! counter inc))
                :max-retries 5}
     (println "Trying...")
     (swap! counter inc)
     (throw (Exception.)))
   (prn @counter))

In 0.7.2 you get:

Trying...
Trying...
Trying...
Trying...
Trying...
Trying...
7
nil

That is, in 0.7.2, the body of the retry block is executed 6 times (the original one, plus the 5 configured retries). And because all the retries end up in an exception (which is the default condition for a failed block), the fallback is executed and we get the result produced by it: 6 produced by the retry-block body (including the retries), plus 1 added by the fallback.

While in 0.8.x you get:

Trying...
2
nil

While in 0.8.x, the body of the retry block is executed just one time and there are no retries at all (in spite of the 5 configured retries). The fallback is directly executed after the first block execution failure and we get the result produced by it: 1 produced by the retry-block body (as there are no retries), plus 1 added by the fallback.

This is even more evident if you attach listeners to :on-retry. On 0.7.2:

(require '[diehard.core :refer :all])
(let [counter (atom 0)]
   (with-retry {:fallback (fn [& _] (swap! counter inc))
                :on-retry (fn [v e] (println "Failed, gonna retry"))
                :max-retries 5}
     (println "Trying...")
     (swap! counter inc)
     (throw (Exception.)))
   (prn @counter))

you get:

Trying...
Failed, gonna retry
Trying...
Failed, gonna retry
Trying...
Failed, gonna retry
Trying...
Failed, gonna retry
Trying...
Failed, gonna retry
Trying...
7
nil

On 0.8.x you get:

Trying...
2
nil

So a it's a major change in behavior, as the fallback is executed instead of the retries stated by the retry policy.

As I said in my initial comment, this is due to the way the policies are composed in 0.8.x (fallback has become a regular policy in failsafe 2.0.x, instead of a special case). The fallback policy is executed before the retry policy. And as the fallback policy produces a value that is not considered a failed execution for the next policy (the retry policy in this case), then the whole retry block is considered as executed successfully. And thus there are no retries at all.

By swapping the way the policies are composed (as shown in my initial comment), the behaviour stays the same as in versions prior to 0.8.x

I'll address why your test doesn't fail in a separate comment.

sunng87 commented 5 years ago

@iarenaza Thanks! I can reproduce this issue and confirm this behaviour in 82b6b51

sunng87 commented 5 years ago

Released in 0.8.3