philomena-dev / philomena

Next-generation imageboard
GNU Affero General Public License v3.0
90 stars 32 forks source link

Migrate to object storage #166

Closed liamwhite closed 1 year ago

liamwhite commented 2 years ago

Important note: for development instances using the default docker-compose.yml settings, no action is required to deploy this update. Both past and future uploads will continue to work without any custom configuration.

Decouples Philomena from using the filesystem directly to store uploaded files, instead going through an object storage server to host objects. The development instance includes S3Proxy, which can handle connection to arbitrary storage backends. This frontend has been tested to work with S3Proxy's filesytem storage and Cloudflare R2 storage.

A replication mode is also implemented, to live-replicate all storage operations to a second backend for higher durability. This is currently expected to be used with Backblaze B2's S3 bindings (but could be used with any S3-compatible provider).

For users who desire to migrate an existing site to an external object storage backend, a mix task has been provided which uploads all requested models which have been modified after the given time: mix upload_to_s3 --adverts --avatars --badges --tags --images --concurrency 100 1970-01-01T00:00:00Z

The S3 interface is configured via environment variables, which should be mostly self-explanatory:

- S3_REGION=us-east-1
- S3_SCHEME=http
- S3_HOST=files
- S3_PORT=80
- S3_BUCKET=philomena
- AWS_ACCESS_KEY_ID=local-identity
- AWS_SECRET_ACCESS_KEY=local-credential

All environment variables also have an ALT_-prefixed version (like ALT_S3_HOST) which specifies parameters for the replica instance, if present. A replica is not required and will not be used if these environment variables are not present.

Note 1: To use Backblaze B2 you must currently modify this regex to ~r/^(us|eu|af|ap|sa|ca|me)\-\w+\-\d$/ to avoid the Backblaze endpoint being detected as AWS. No explicit configuration required any longer with latest changes.

Meow commented 1 year ago

Migration instructions

  1. Install openresty, pick an object storage provider (we recommend Cloudflare R2, or if you want to use your local filesystem - use s3proxy as shown in our docker-compose.yml), optionally also pick a backup object storage provider (which will be used for failover in case main one goes offline, and will also replicate all images written to the main one for backup purposes, we recommend Backblaze B2), create buckets and S3 access tokens for them
  2. Clone latest (S3-compatible) philomena into a separate temporary folder, (do not deploy this to your main production server yet) clone your old production environment variable config, add object storage related variables, make sure to rename the node into something different and give it a different port
  3. In the temporary philomena folder, run mix upload_to_s3 --concurrency $(nproc) --adverts --avatars --badges --tags --images 2010-01-01T00:00:00Z (replace $(nproc) with a desired number is lower concurrency is desired)
  4. Run the task again, but replace the date with the date and time of when you started the previous upload
  5. Take the site offline, and run the task again, now with the date and time of the previous operation
  6. Stop and delete the temporary philomena folder, and deploy the changes to production
  7. Copy the object storage related environment variables from the temporary environment config file to the main one
  8. (if you were using nginx before) Remove nginx, keep its configs, copy them to the openresty config folder, edit them with correct file paths
  9. opm install jkeys089/lua-resty-hmac
  10. Edit your nginx/openresty configs to resemble docker/web/nginx.conf, make sure that it's aware of your object storage key and host (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_HOST, S3_BUCKET and S3_SCHEME env vars need to be set for the user which runs openresty), restart openresty
  11. Take site back online