blob storage backend - Githubissues

The most expensive portion of the current ptpb.pw deployment is mongo. The mongo instances are relatively huge--I've tried several times to downgrade, each time causing service-disruptive performance issues.

Mongo itself was only used for historical reasons (the original pb implementation used a database server, and using mongo was just a natural evolution). However, pb is actually just blob storage, and a database server isn't actually required or appropriate for this.

By contrast, here are the reasons for switching to blob storage backends like S3 (production) and pairtree (development):

easier deployment (no persistent block/filesystem storage requirements)
better performance (no indexes to maintain, no queries to execute)
vastly improved cost (doesn't require 5 instances for a HA deployment, or massive instance sizes for sane IO performance)

Metadata for features like sunset and mimetype are still supported by blob storage backends. The only incompatibility is lookup. Currently, pb is excessively flexible in the way it allows pastes to be referenced. This is mainly due to poor planning, and backwards-compatibility. Luckily, some obsolete lookup methods have already been removed as of f8ccb96a29829d6d371ff7c83d3b1d792294e745.

The biggest problem is going to be shortid lookups. There are at least two options:

use the bucket list API with prefix to find matching objects, then object get any match a) fully compatible behavior, ~~but probable (linear?) performance issues~~ b) at least, requires two backend requests instead of one for each GET
only support shortid lookups on normal pastes, and only support longid lookups on private pastes a) this breaks support for other alternatives like digest lookups b) this also means that a colliding shortid completely replaces the old paste, instead of just masking it

2 is easier to implement, but after reading what I just wrote, I dislike the idea a lot.

According to Amazon:

List performance is not substantially affected by the total number of keys in your bucket, nor by the presence or absence of the prefix, marker, maxkeys, or delimiter arguments

ptpb / pb

blob storage backend #177