wellcomecollection / storage-service

🗃️ Managing the safe, long-term storage of our digital collections in the cloud
MIT License
32 stars 5 forks source link
cloud-storage digital-preservation

storage-service

Build status Deploy stage Deploy prod

This is the Wellcome Collection storage service. It manages the storage of our digital collections, including:

Requirements

The storage service is designed to:

High-level design

The user uploads a "bag" to the storage service. This bag should use the BagIt packaging format. The user could be a person, or an automated workflow system like Goobi or Archivematica.

The storage service verifies the fixity information in the bag (checksums, file sizes, filenames). If the fixity information is correct, it replicates the bag to multiple storage locations, split across different cloud providers and geographic locations.

The storage service stores exactly the bytes you give it; no more, no less. It does not do any introspection of the bag contents, or change its behaviour based on the files a bag contains.

The storage service runs entirely in AWS, with no on-premise infrastructure required.

For more detailed information about the design, see our documentation.

Documentation

We have documentation about the storage service, which includes:

Usage

We run two instances of the storage service at Wellcome:

Each instance of the storage service is completely separate. They don't share any files or storage.

If you want to store files in the storage service, you should run your own instance -- the instances we run are only for use at Wellcome. We publish our Docker images and infrastructure code, to allow other people to run the storage service.

For instructions, see our documentation.

Getting started: use Terraform and AWS to run a storage service demo

We have a Terraform configuration that spins up an instance of the storage service. You can use this to try the storage service in your own AWS account.

License

MIT.