Closed sivukhin closed 1 month ago
This approach could lead to unbound memory usage, or did I get it wrong? Another way to address the same issue is if we skip any number of frames we drop the queue and send a new snapshot. It will increase our networking usage, but memory-wise we should be safer.
@athoscouto, yes - this fix leads to unbounded memory usage in case of S3 issues.
I'm not sure about your snapshot idea because if S3 is not working for frames - why it will work for snapshots?
afaik, bottomless won't perform a new snapshot until previous one completed
I'm not sure about your snapshot idea because if S3 is not working for frames - why will it work for snapshots?
The idea is that it will work eventually. And until then we don't need to keep track of frames. When it works we just upload a new snapshot from the database file.
Context
Current implementation of
bottomless
skip frames which were failed during upload to S3. This create significant issue with restore process from such backup asbottomless
can restore DB only from consecutive frames and first gap in frames will just stop restore process.So, in case of slight fluctuation in S3 (like sqld network issue or just some degradation of S3) the backup will be "frozen" in time and all changes in the DB made after that fluctuation will not be restore from current generation (but next generation will have full snapshot and issue will disappear after generate bump will happen).
This PR addresses this nuance of
bottomless
implementation: now instead of bounded queue with frames for upload and fail-fast logic in case of S3 upload failure it has unbounded channel (in order to not block local backup process) with frames for S3 upload and retry logic in case of failure. In case of shutdownbottomless
will try to finish all attempts for S3 upload but will not perform any retries (so, in case of shutdownbottomless
will work in a "best-effort" mode and can skip some frames in case of S3 issues)