Closed vponomaryov closed 2 years ago
@vponomaryov, you think we take this for a ride in SCT ? change the install to be from your branch ? there are use cases we will want to see the impact of this.
@vponomaryov, you think we take this for a ride in SCT ? change the install to be from your branch ? there are use cases we will want to see the impact of this.
Why my branch? If no concerns, then merge PR, create new tag/version and update the scylla_bench_version
config option in SCT. That's all.
Then we will be able to use these new config options in the SCT.
@vponomaryov, you think we take this for a ride in SCT ? change the install to be from your branch ? there are use cases we will want to see the impact of this.
Why my branch? If no concerns, then merge PR, create new tag/version and update the
scylla_bench_version
config option in SCT. That's all. Then we will be able to use these new config options in the SCT.
Mainly cause i'm not an admin on this repo, plus we want to make sure it's would do the effect we anticipate for it.
try taking longevity-large-partition-8h
or longevity-large-partition-asymmetric-cluster-3h
for a run with this:
diff --git i/sdcm/cluster.py w/sdcm/cluster.py
index 0e2f349a..459e56ac 100644
--- i/sdcm/cluster.py
+++ w/sdcm/cluster.py
@@ -494,7 +494,7 @@ class BaseNode(AutoSshContainerMixin, WebDriverContainerMixin): # pylint: disab
echo 'export GOPATH=$HOME/go' >> $HOME/.bash_profile
echo 'export PATH=$PATH:/usr/local/go/bin' >> $HOME/.bash_profile
source $HOME/.bash_profile
- curl -Lo sb.zip https://github.com/scylladb/scylla-bench/archive/refs/{sb_version}.zip
+ curl -Lo sb.zip https://github.com/vponomaryov/scylla-bench/archive/refs/heads/add-retry.zip
unzip sb.zip
cd ./scylla-bench-*
GO111MODULE=on go install .
Test job where this PR was applied: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/valerii/job/longevity-large-partition-asymmetric-cluster-3h/2/console
@vponomaryov if I understand the code correctly, it supports now the retry but it doesn't prevent the current errors we get for timeouts. What we want is that the current timeouts will be "warnings" but if all retries fails, we want a error (that SCT may treat it as a critical event - like we have in c-s).
@vponomaryov if I understand the code correctly, it supports now the retry but it doesn't prevent the current errors we get for timeouts. What we want is that the current timeouts will be "warnings" but if all retries fails, we want a error (that SCT may treat it as a critical event - like we have in c-s).
There is separate option for timeout -> --timeout
. Just configure values you need.
@vponomaryov if I understand the code correctly, it supports now the retry but it doesn't prevent the current errors we get for timeouts. What we want is that the current timeouts will be "warnings" but if all retries fails, we want a error (that SCT may treat it as a critical event - like we have in c-s).
There is separate option for timeout ->
--timeout
. Just configure values you need.
This is not enough to answer the requirement.
@vponomaryov if I understand the code correctly, it supports now the retry but it doesn't prevent the current errors we get for timeouts. What we want is that the current timeouts will be "warnings" but if all retries fails, we want a error (that SCT may treat it as a critical event - like we have in c-s).
There is separate option for timeout ->
--timeout
. Just configure values you need.This is not enough to answer the requirement.
If you mean non-0 exit of a loader than it is different kind of a task - add one option about behavior on such an error. Right now it just logs errors and moves forward.
I'm not sure I'm following, let's discuss it.
@vponomaryov if needed help with defining a simple test to test it, please let me know.
@vponomaryov if needed help with defining a simple test to test it, please let me know.
It was tested lots of times in multiple test runs.
The only nuance here is that there are no means to make sure it was really applied by gocql
.
The details about the nuance are here: https://github.com/scylladb/gocql/issues/101
From my side - we can merge this one, because here we utilize the gocql
functionality and if retry doesn't work then it is bug of gocql
and not scylla-bench
.
Recent update: First commit makes retry logic be working correctly. Second commit enables the possibility to use retry logic.
Full workability of the retry logic requires also following fix to the gocql
project: https://github.com/scylladb/gocql/pull/104
It means that, after it's merge, we should wait for the release with this fix and then switch scylla-bench to it.
Do we want for the gocql PR to be merged before we merge this one?
Do we want for the gocql PR to be merged before we merge this one?
No need in it. The fix of gocql
will be applied by the update of the gocql version used by scylla-bench
. And it will be separate change.
@roydahan, this is tested and confirmed working, and can be merged from my POV, both me and and @vponomaryov tested it.
@roydahan can you grant us permissions, we can move forward with some of the other fixes, and release/tag a new version ? so you can take them to SCT ?
For the moment we cannot tell 'scylla-bench' to make retries if some request fails. So, add such possibility reusing 'RetryPolicy' mechanism from the 'gocql' package.
Following new options are added: