Open lukaszsurfer opened 3 months ago
Restoring is obviously different than backing up and cannot be compared to. We are working these days on improving the restore speed, with various changes to the restore strategy, parallelism and so forth. It'd be great if you can share logs, so we can see what we already suspect is indeed the issue: some shards get 'large' sstables to work on while other shards get smaller ones, so they finish theirs quickly and are actually idle (as batches are per node, not per shard). From that perspective, large batch size is not helpful at all. Would be great to see which table and how it's organized to see if we do have this imbalance.
Other areas we look into is to disable compaction during restore, to increase streaming performance (small improvement - see https://github.com/scylladb/scylladb/pull/20187 ), parallel download and restore (but I suspect it's also a small improvement).
Sure @mykaul , will share the logs of subsequent restore execution. Quick question: what batch-size
and parallel
values would you recommend for running the restore on the setup described above ( 4 CPU / 31 GB RAM / 5 nodes)?
Sure @mykaul , will share the logs of subsequent restore execution. Quick question: what
batch-size
andparallel
values would you recommend for running the restore on the setup described above ( 4 CPU / 31 GB RAM / 5 nodes)?
@Michal-Leszczynski - I believe a batch-size of 100 or so is OK, and for parallel, what is the optimal here?
Here are the logs of another execution of restore, 1h 11 minutes for 49 GB of data:
scylla@scylla-manager-7fbf77594-4xtmk:/$ sctool backup list -c scylla/scylla-cluster --all-clusters --location gcs:scylla-backup-staging
Cluster: 45d3e040-c349-4e86-8893-e2aba7a037c6
backup/9ce3c718-5596-4cda-b02b-415c6dfb728b
Snapshots:
- sm_20240819100817UTC (49.271G, 5 nodes)
Keyspaces:
- dataforseo (1 table)
- system_schema (15 tables)
- system_traces (5 tables)
- system_distributed (4 tables)
- system_distributed_everywhere (1 table)
scylla@scylla-manager-7fbf77594-4xtmk:/$ sctool info restore/1ad04728-8c98-4474-bda0-9ca6f3887d04 -c scylla/scylla-cluster
Name: restore/1ad04728-8c98-4474-bda0-9ca6f3887d04
Cron: {"spec":"","start_date":"0001-01-01T00:00:00Z"} (no activations scheduled)
Tz: UTC
Retry: 3 (initial backoff 10m)
Properties:
- batch-size: 100
- location: 'gcs:scylla-backup-staging'
- parallel: 5
- restore-tables: true
- snapshot-tag: sm_20240819100817UTC
+--------------------------------------+------------------------+----------+--------+
| ID | Start time | Duration | Status |
+--------------------------------------+------------------------+----------+--------+
| 7c0b4faa-5e41-11ef-a03d-6a134b736877 | 19 Aug 24 15:41:21 UTC | 1h11m43s | DONE |
+--------------------------------------+------------------------+----------+--------+
the node5 log has way more content than the other:
Most likely the restore here was not utilizing all the nodes equally during the process.
Before https://github.com/scylladb/scylla-manager/issues/3981 , only SSTables from a single node/table were divided into the batches. It means that with the high value set to batch-size
it's possible that there will be no batches available to all nodes at the moment and eventually some of the nodes are idle during the restore.
There will be an improvement in 3.4 version of Scylla Manager, where the issue mentioned in this comment will be included into the release.
The 3.4 release (marked as 3.3.3 in GH) is the current focus for manager.
Hello!
On following setup, latest GKE Standard cluster, we do have 5 nodes with 4 NVME Local SSD each:
The backup of ~120GB of data has been taken to Google Cloud Storage bucket in the same region, with rate limit
0
:and it took only less than 7 minutes to complete the backup:
The process of restore of backup is very very slow:
as we can see ☝️ the
batch-size
andparallel
is set to 1000, and it looks like it takes around 1 minute to restore 1 GB of data:Simultaneously, the average load of cluster reported via Prometheus/Grafana dashboard is about ~25% only. Confirmed with simple kubectl check:
Changing the
parallel
andbatch-size
values does not result with improvement of restore speed, neither the average load of the cluster.As we do aim to store a multiple TB of data on GKE running Scylla Operator with Local SSD, with current performance of backup restore, it would take days to handle those terabytes restoration 🙅
Any ideas how to improve the restore speed would be helpful, especially in the context that creation of a backup is very fast.