pingcap / tidb-lightning

This repository has been moved to https://github.com/pingcap/br

Apache License 2.0

142 stars 66 forks source link

streaming from cloud object storage #233

Open gregwebs opened 5 years ago

gregwebs commented 5 years ago

Feature Request

To restore a large amount of data, if we cannot stream it we must wait for the entire backup to be downloaded from object storage. Our restore process should instead look like this, which will make it quicker and reduce disk requirements (cost):

1) Download the metadata 2) Download chunks of data 3) Apply the downloaded chunks of data while downloading new ones. Backpressure from applying should limit the downloading

gregwebs commented 4 years ago

It is possible to try to do this by feeding data through stream-based tools to lightning. However, lightning currently expects a directory and the data may be stored as a gzip tarball.

So tidb-lightning needs to either handle the unpacking itself or be able to continue to look for new files showing up in the directory until it receives a signal that all is finished. It may also be possible to achieve this by running lightning separately for each table using table filter techniques.

kennytm commented 4 years ago

🤔 I assume this is more than #69?

gregwebs commented 4 years ago

69 does not require a streaming implementation to be completed.

Our new BR tool streams data to and from cloud storage nicely.