josegonzalez / python-github-backup

backup a github user or organization
MIT License
1.3k stars 235 forks source link

Safer incremental backups #244

Open lukasbestle opened 7 months ago

lukasbestle commented 7 months ago

Status quo

At the moment, incremental backups always use the last update of the backup as a since date, even if the last backup failed. This is documented as a known issue.

Proposed feature

The behavior could be improved by storing the timestamps of the last successful updates in a file, e.g. /.successful-timestamps.json:

{
    "some-repo/issues": 1707118232,
    "some-repo/pulls": 1707118232,
    // ...
}

Then failed backups would not break future incremental backups.

Migration

The implementation would need to be backwards-compatible with existing backup stores that don't have this file yet or where the file is incomplete. It could work like this:

  1. If the file exists and contains the key for the backup source in question, use the timestamp as the since time.
  2. If the file doesn't exist or does not contain the relevant key but an incremental backup was requested, use the old behavior.
  3. In any case (full or incremental backup), create or update the file with the new timestamps for the next run(s).
josegonzalez commented 7 months ago

This project is considered feature complete for the primary maintainer @josegonzalez. If you would like a bugfix or enhancement, pull requests are welcome. Feel free to contact the maintainer for consulting estimates if you'd like to sponsor the work instead.