Open nnsgmsone opened 4 years ago
zero-co[y clones do share the S3 storage. But once the clone is created, the new files that it generates will be at a new location in S3.
Maybe u can look at the example of a zero-copy-clone here https://github.com/rockset/rocksdb-cloud/blob/master/cloud/examples/clone_example.cc#L32
@dhruba How else to ensure consistency through wal, do I need to manually build a checkpoint and then the new node recovers from the checkpoint
If you store the wal in Kinesis and the sst files in S3, then when you reopen the db on a different machine, you can replay the WAL. For example, if the WAL is on kafka, then you can follow this pattern: https://github.com/rockset/rocksdb-cloud/blob/master/cloud/db_cloud_test.cc#L797
If wal is on kinesis: https://github.com/rockset/rocksdb-cloud/blob/master/cloud/db_cloud_test.cc#L836
Just an FYI: for our Rockset's use-case, we switch OFF the rocksdb-cloud's WAL
@dhruba Well, there is a problem. If writing occurs during clone, should I block the writing or generate a checkpoint myself? I see the example just to flash all sst to s3.
In other words, i should control the timing of replay wal?
If there is no checkpoint, then writes appearing during the clone seem to cause data inconsistency between the two nodes.
In other words, i should control the timing of replay wal?
Yes, that makes sense. Let me know if this works for you.
ok
@dhruba newbie question, when you say replay the wal from Kafka does that means you replay the wall for the start of time ? if not, how the replica knows about the highest watermark aka Kafka offset ? To be more clear the use case am trying to tackle is to be able to migrate a rocks Db instance on failure and trying to avoid replaying the log from the start of time because that takes hours. Sorry am still trying to get familiar with the rocksDb code base.
Hi @b-slim, thanks for your question.
You can use the zero-copy clone this way. Suppose you have a rocksdb-cloud database D1 that has a WAL and uploads it sst files to S3. Lets say that the WAL is in kafka.
Now, if you want to make a zero-copy clone you will do these steps in this order:
Now both C1 and D1 and normal rocksb-cloud database and are not related to one another. Writes done to one do not show up in the other, which is the expected behaviour.
Some of the sst S3 files are still shared, and you have to be careful to ensure that they do not get erased prematurely. Ensure that purger is enabled https://github.com/rockset/rocksdb-cloud/blob/master/include/rocksdb/cloud/cloud_env_options.h#L245 and disable file deletions via https://github.com/rockset/rocksdb-cloud/blob/master/include/rocksdb/db.h#L1095. Just a caveat that we, at Rockset, do not run the purger in our production cluster. If you find bugs in the workings of the purger code, pl do submit a pull request.
@dhruba By the way, the above method should cause the merge of the lsm tree to fail. Assuming whether merge has occurred on the clone, what is the processing flow of rockdbcloud at this time?Because I am not very familiar with cpp, I did not find the code to handle this situation. . In addition, I think it is possible to introduce checkpoint to ensure data consistency (although this will change more code).
@nnsgmsone I do not understand this "By the way, the above method should cause the merge of the lsm tree to fail. "
How do I use zero copy? Does the so-called zero copy share s3 storage or not share it? I see ppts introduction of both situations, very confused. Hope to answer.