lucyparsons / OpenOversight

Police oversight and accountability through public data 👮
https://openoversight.com
GNU General Public License v3.0
237 stars 79 forks source link

backups are way too big #740

Open redshiftzero opened 4 years ago

redshiftzero commented 4 years ago

Our backup strategy at this point involves fetching gigabytes of data. We should do at least one of the following:

  1. Emit progress output to prevent backups running in circle from failing due to no build output (if there isn't output in a 10 minute window, the job will fail)
  2. Prune trash images to reduce backup size: #205
  3. Incremental backups
  4. Something else?
redshiftzero commented 4 years ago

from the OO meeting just now I learned (sorry I'm not citing who it was) it turns out we can set up backups of the images on the AWS S3 side, we should look into doing that instead

ghost commented 4 years ago

I think a backup procedure should be developed just to make it more uniform for multiple deployment strategies. For small enough deployments, there wouldn't even need to be an S3 deployment, but with a sufficiently large photo archive, S3 type buckets would be good. It's worth noting that since AWS offers glacier, that could be used for longterm backup storage specifically for AWS. Versioning can be enabled for the S3 buckets.

r4v5 commented 3 years ago

This is probably a bit lower priority than it was because it still fits on the prod machine, but when it no longer does it'll be a real annoying time for us, so still better to fix :)

abandoned-prototype commented 3 years ago

Thanks @r4v5 for fixing the production backup! So the remaining thing to do here is to find a better solution to back up the image data in our S3-bucket, and then have the existing backup functionality only deal with the database, right?