scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
58 stars 95 forks source link

Add backup benchmarking under read stress #9307

Open kreuzerkrieg opened 5 days ago

kreuzerkrieg commented 5 days ago

Introduce the test_backup_benchmark test, which measures backup time under read stress conditions. This test performs multiple actions to consolidate all necessary data into a single table. Initially, it runs and measures the backup process, followed by the read stress test. Finally, it executes both processes asynchronously to observe how the performance of reading and backing up degrades.

Argus results: For 100GB run backup time [s] upload time [s] total [s]
Backup times 00:00:12 00:09:45 00:09:11
Backup during read stress 00:00:12 00:15:07 00:14:43
read time [s]
Read stress 00:08:06
Read stress during backup 00:07:59

fixes: #8752

kreuzerkrieg commented 5 days ago

Introduce the test_backup_benchmark test, which measures backup time under read stress conditions. This test performs multiple actions to consolidate all necessary data into a single table. Initially, it runs and measures the backup process, followed by the read stress test. Finally, it executes both processes asynchronously to observe how the performance of reading and backing up degrades.

Argus results: For [100GB run](https://argus.scylladb.com/tests/scylla-cluster-tests/6a18ccd0-1d6a-4f00-a473-

I have to admit it looks super suspicious that the read did not degrade even slightly

mikliapko commented 5 days ago

@kreuzerkrieg There was some refactoring done for Manager tests. Please, consider rebasing to master and putting your new test to the right place or probably to a separate class for Backup benchmark.

kreuzerkrieg commented 4 days ago

@kreuzerkrieg There was some refactoring done for Manager tests. Please, consider rebasing to master and putting your new test to the right place or probably to a separate class for Backup benchmark.

rebased and moved the test to another class

regevran commented 1 day ago

The units in the issue should be ~[s]~ --> [h:m:s]

kreuzerkrieg commented 1 day ago

The units in the issue should be ~[s]~ --> [h:m:s]

It is the way it is presented, I think, I do send seconds to it and it presents it as a time

Michal-Leszczynski commented 1 day ago
backup time [s] upload time [s] total [s]
Backup times 00:00:12 00:09:45 00:09:11
Backup during read stress 00:00:12 00:15:07 00:14:43

From the code I see that backup time is total backup time without upload, right? It would be nice to indicate it with a more descriptive column name.

Also, why total is smaller than upload time?

kreuzerkrieg commented 1 day ago

backup time [s] upload time [s] total [s] Backup times 00:00:12 00:09:45 00:09:11 Backup during read stress 00:00:12 00:15:07 00:14:43 From the code I see that backup time is total backup time without upload, right? It would be nice to indicate it with a more descriptive column name.

Should I name it "snapshot time"?

Also, why total is smaller than upload time?

Good question, the total time is taken from backup task.duration the rest is measured in place and it doesnt add up. Ideas?

Michal-Leszczynski commented 1 day ago

Should I name it "snapshot time"?

It's not only about snapshot (e.g. we also fetch schema, create manifests, etc...). In general I'm not sure why do we need to report the time up to the upload stage? Is it useful? But I see that the upload time is also not only about upload (e.g. purging backup location from unnecessary files). I will leave some comments in changed files in a bit.

I created and issue about displaying backup upload bandwidth/duration - in the future it could make things easier for such benchmarks.

kreuzerkrieg commented 18 hours ago

Should I name it "snapshot time"?

It's not only about snapshot (e.g. we also fetch schema, create manifests, etc...). In general I'm not sure why do we need to report the time up to the upload stage? Is it useful? But I see that the upload time is also not only about upload (e.g. purging backup location from unnecessary files). I will leave some comments in changed files in a bit.

I created and issue about displaying backup upload bandwidth/duration - in the future it could make things easier for such benchmarks.

Now when I see that it is negligible slice of the whole process I guess it is worth just dropping it