Open kreuzerkrieg opened 5 days ago
Introduce the
test_backup_benchmark
test, which measures backup time under read stress conditions. This test performs multiple actions to consolidate all necessary data into a single table. Initially, it runs and measures the backup process, followed by the read stress test. Finally, it executes both processes asynchronously to observe how the performance of reading and backing up degrades.Argus results: For [100GB run](https://argus.scylladb.com/tests/scylla-cluster-tests/6a18ccd0-1d6a-4f00-a473-
I have to admit it looks super suspicious that the read did not degrade even slightly
@kreuzerkrieg There was some refactoring done for Manager tests. Please, consider rebasing to master and putting your new test to the right place or probably to a separate class for Backup benchmark.
@kreuzerkrieg There was some refactoring done for Manager tests. Please, consider rebasing to master and putting your new test to the right place or probably to a separate class for Backup benchmark.
rebased and moved the test to another class
The units in the issue should be ~[s]~ --> [h:m:s]
The units in the issue should be ~[s]~ --> [h:m:s]
It is the way it is presented, I think, I do send seconds to it and it presents it as a time
backup time [s] | upload time [s] | total [s] | |
---|---|---|---|
Backup times | 00:00:12 | 00:09:45 | 00:09:11 |
Backup during read stress | 00:00:12 | 00:15:07 | 00:14:43 |
From the code I see that backup time
is total backup time without upload, right? It would be nice to indicate it with a more descriptive column name.
Also, why total
is smaller than upload time
?
backup time [s] upload time [s] total [s] Backup times 00:00:12 00:09:45 00:09:11 Backup during read stress 00:00:12 00:15:07 00:14:43 From the code I see that
backup time
is total backup time without upload, right? It would be nice to indicate it with a more descriptive column name.
Should I name it "snapshot time"?
Also, why
total
is smaller thanupload time
?
Good question, the total time is taken from backup task.duration
the rest is measured in place and it doesnt add up. Ideas?
Should I name it "snapshot time"?
It's not only about snapshot (e.g. we also fetch schema, create manifests, etc...). In general I'm not sure why do we need to report the time up to the upload stage? Is it useful? But I see that the upload time is also not only about upload (e.g. purging backup location from unnecessary files). I will leave some comments in changed files in a bit.
I created and issue about displaying backup upload bandwidth/duration - in the future it could make things easier for such benchmarks.
Should I name it "snapshot time"?
It's not only about snapshot (e.g. we also fetch schema, create manifests, etc...). In general I'm not sure why do we need to report the time up to the upload stage? Is it useful? But I see that the upload time is also not only about upload (e.g. purging backup location from unnecessary files). I will leave some comments in changed files in a bit.
I created and issue about displaying backup upload bandwidth/duration - in the future it could make things easier for such benchmarks.
Now when I see that it is negligible slice of the whole process I guess it is worth just dropping it
Introduce the
test_backup_benchmark
test, which measures backup time under read stress conditions. This test performs multiple actions to consolidate all necessary data into a single table. Initially, it runs and measures the backup process, followed by the read stress test. Finally, it executes both processes asynchronously to observe how the performance of reading and backing up degrades.fixes: #8752