Open tchaikov opened 3 months ago
@Michal-Leszczynski hi Michal, what do you think? i am working with Pavel on this project. and wanted to help on the testing front. would be great if we could work together to move this forward.
cc @bhalevy @xemul @regevran
@tchaikov - please contact @pehala for support on collecting the data on existing tests.
in the backup/restore process of scylla-manager, each <keyspace, table, snapshot_tag>
tuple is mapped to a path looks like $keyspace/$table/snapshots/$snapshot_tag
, which is located under the specified bucket. under this "directory", it preserves:
manifest.json
schema.cql
it queries an external CQL database to track the backup progress.
some interesting findings:
$bucket/$table_name/$snapshot_name/$sstable_component
$dc:$provider:$location_path/backup/sst/cluster/$cluster_id/dc/$dc_id/node/$node_id/keyspace/$keyspace/table/$table_name/$snapshot_version
in which, the value of $provider
could be one of "s3"
, "gcs"
and "azure"
. when it comes to "s3"
, the $location_path
should be the bucket name. Location
describes a certain object storage endpoint. it should be mapped to the "endpoint" definition of scylladb, when setting up a scylladb instance which supports backup/restore to/from object storage.but since scylla-manager will be using scylladb's backup/restore APIs, these differences won't be visible from scylla-manager, which should be using them instead of RcloneMoveDir()
in pkg/service/backup/worker_upload.go
.
unfortunately, we don't have a unit test for Worker.Upload()
yet. there are some noticeable differences:
versioned file: scylla-manager uses the idea of "versioned file", which encodes the version number in the object name as its suffix. the purpose is to avoid conflict of different sstable with the same name from the same node. and the snapshot tag is used as the suffix. but we don't use this technique anymore when working with the newer version sstable.
currently we use an agent running on scylladb node to serve as a web server, which handles the requests from scylla-manager. this agent provides the service like copy / move to / from object storage. i think, we could
scyllaclient/client_scylla.go
, so it supports the backup and restore APIsuploadSnapshotDir()
instead in service/backup/worker_upload.go
backup testing: we don't have tests for the sync/copydir
API yet, probably we could add them for rclone/rcserver
? when it comes to scylla's backup/restore integration, i think we should perform end-to-end test, as we should expose the directory arrangement of backup to its callers. unless we believe it is part of its public interface.
restore implementation: we upload the sstables to scylla nodes in pkg/service/restore/tablesdir_worker.go
, which
RcloneCopyFile()
RPC call. this is completed in the StageData
./storage_service/sstables/{keyspace}
of scyllarestore schema testing: we have two test suites: pkg/service/restore/restore_integration_test.go
and pkg/service/restore/service_restore_integration_test.go
. both of them use the same methodology as below. and this end-to-end test still applies to the new scylla AP:
What is the expected behavior from the user point of view - what is the level of specification one should supply in order to backup? i.e. I guess $provider
, $location_path
and a must. what about $keyspace
? $snapshot_version
?
I am not sure I follow the existing API vs. the new API and how one should map between the two.
I guess that overall we'll need two sets of tests that run in parallel for the transition period - until all customers upgrade to the version that supports S3 backup from within Scylla. This is because we'll probably change/fix tests for the existing flow as well as for the new flow.
@regevran @tchaikov Correct me if I am wrong, but I believe the flow for the customers should stay the same.
@pehala yeah, you are correct.
@tchaikov sorry for such a later response, I've been busy with patching current SM restore implementation, as it was the hottest priority from SM POV. I will take a look at those two PRs tomorrow:
In case you have any question about how SM restore/backup operate (or why does it work like that), please ask me. We can even schedule a call if needed.
And this agent utilizes rclone for upload/download. And it supports multiple backend. While scylla's backup/restore APIs only support S3 at the time of writing. Our initial focus is on supporting Amazon S3. Unlike third-party solutions such as rclone, this native implementation offers several advantages:
I think it's a really good choice to rely on AWS SDK instead of delegating it to 3rd part tools/libs like RClone. You will definitely have much better control over the whole copy/move/delete process. We were thinking about removing RClone and replacing it with pure SDK usage.
There are many tools that are compatible with the S3 API, like Minio (https://min.io/) so it may not be AWS only. There are some customers (or prospects) that use Minio already.
BTW, it's worth to include it into integration tests. We do that in Scylla Manager already. https://github.com/scylladb/scylla-manager/tree/master/testing
We are enhancing ScyllaDB with a native RESTful API to efficiently backup and restore SSTables to and from object storage services
@tchaikov Do you have some swagger designing the API for backup and restore already that we could take a look on ?
And this agent utilizes rclone for upload/download. And it supports multiple backend. While scylla's backup/restore APIs only support S3 at the time of writing. Our initial focus is on supporting Amazon S3. Unlike third-party solutions such as rclone, this native implementation offers several advantages:
I think it's a really good choice to rely on AWS SDK instead of delegating it to 3rd part tools/libs like RClone. You will definitely have much better control over the whole copy/move/delete process. We were thinking about removing RClone and replacing it with pure SDK usage.
yeah, i agree. probably you are talking about scylla's S3 implementation? if that's the case, the reason is that we need to have an implementation which uses the seastar framework, so we have to reinvent the wheel.
There are many tools that are compatible with the S3 API, like Minio (https://min.io/) so it may not be AWS only. There are some customers (or prospects) that use Minio already.
yeah, i knew. by AWS S3, i meant S3 API. not limited to AWS.
BTW, it's worth to include it into integration tests. We do that in Scylla Manager already. https://github.com/scylladb/scylla-manager/tree/master/testing
what do you mean by "it"?
We are enhancing ScyllaDB with a native RESTful API to efficiently backup and restore SSTables to and from object storage services
@tchaikov Do you have some swagger designing the API for backup and restore already that we could take a look on ?
sure.
BTW, it's worth to include it into integration tests. We do that in Scylla Manager already. https://github.com/scylladb/scylla-manager/tree/master/testing
what do you mean by "it"?
I mean Minio.
@tchaikov sorry for such a later response, I've been busy with patching current SM restore implementation, as it was the hottest priority from SM POV. I will take a look at those two PRs tomorrow:
In case you have any question about how SM restore/backup operate (or why does it work like that), please ask me. We can even schedule a call if needed.
hi @Michal-Leszczynski thanks for your reply. i just sent a meeting invite to you. hopefully the time works for you so you can hop in and we can sync up with each other.
BTW, it's worth to include it into integration tests. We do that in Scylla Manager already. https://github.com/scylladb/scylla-manager/tree/master/testing
what do you mean by "it"?
I mean Minio.
yeah, we are already using minio for testing. see https://github.com/scylladb/scylladb/blob/e4b213f041b131f38a9d782c67152e1203bd3a7e/test/pylib/minio_server.py#L28
hi @Michal-Leszczynski thanks for your reply. i just sent a meeting invite to you. hopefully the time works for you so you can hop in and we can sync up with each other.
May I join too?
hi @Michal-Leszczynski thanks for your reply. i just sent a meeting invite to you. hopefully the time works for you so you can hop in and we can sync up with each other.
May I join too?
it's the weekly backup and restore meeting. so you are already invited.
@tchaikov - I suggest we close this issue as we learnt a lot since it was opened. Some of the information here is not updated and even confusing.
We are enhancing ScyllaDB with a native RESTful API to efficiently backup and restore SSTables to and from object storage services. The existing backup process is documented at https://github.com/scylladb/scylla-manager/blob/master/docs/source/backup/index.rst#process , which is quite similar to the process using the native backup API. The only difference is that the existing implementation uses an agent running on scylla instance. And this agent utilizes
rclone
for upload/download. And it supports multiple backend. While scylla's backup/restore APIs only support S3 at the time of writing. Our initial focus is on supporting Amazon S3. Unlike third-party solutions such as rclone, this native implementation offers several advantages:Now that we've implemented these two APIs
Integration of the new API:
Enhancement of testing coverage:
It's important to note that these efforts extend beyond Scylla Manager. We should also:
This holistic approach will ensure that both scylla-manager and scylladb are fully aligned with the new native backup and restore functionality, providing a robust and efficient solution for users.
As the initial phase, I will