scalar-labs / scalar-jepsen

Jepsen tests for ScalarDB and ScalarDL
45 stars 6 forks source link

Improve retry for `scalardb` test #131

Open komamitsu opened 1 day ago

komamitsu commented 1 day ago

Description

Recently, scalardb.transfer$read_all_with_retry often failed when there are many records to be lazy-recovered more than the max retry (8). To mitigate the failure, this PR increases the max retry. But just increasing it causes too long wait duration. For instance, increasing the max retry up to 10 will result in 1024 seconds wait. So, this PR also introduces an upper limit of wait duration (32 seconds) since too long retry duration doesn't make sense basically.

With the current retry logic and the max retry, the total wait duration until timeout is 510 seconds.

irb(main):006:0> 8.times.inject(0) {|acc, x| acc + 1000 * (2 ** (x + 1))}                                                                                                                                       
=> 510000

So, this PR increases the max retry to 20 so that total wait duration (542 seconds) is similar to the original one

irb(main):017:0> 20.times.inject(0) {|acc, x| acc + [1000 * (2 ** (x + 1)), 32000].min}                                                                                                                         
=> 542000

(Actually if the max retry is 19, the total wait duration is 510 seconds as same as the original one. But 19 seems a bit weird to me, and I set it to 20. But I don't have a strong opinion on it.)

Related issues and/or PRs

https://github.com/scalar-labs/scalar-jepsen/pull/97

Changes made

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

Additional notes (optional)

None