snarfed / arroba

Python implementation of Bluesky PDS and AT Protocol, including repo, MST, and sync XRPC methods
https://arroba.readthedocs.io
Creative Commons Zero v1.0 Universal
43 stars 1 forks source link

Store blobs #13

Open snarfed opened 1 year ago

snarfed commented 1 year ago

Not a priority for Bridgy Fed or me otherwise personally, but we should probably implement blob storage, uploadBlob/getBlob, etc.

snarfed commented 12 months ago

Deprioritizing. Shipped remote blobs w/datastore_storage.AtpRemoteBlob for generating blobs for externally hosted files, which is working well enough for my needs.

snarfed commented 2 months ago

This came up again recently: Bridgy Fed hit a case where (our best guess is) an image URL originally served one format, image/webp, and then later switched to serving an image/webp. We fetched the first image, saw image/webp, stored that and the URL and image CID in an AtpRemoteBlob, and populated that CID and mime type into a blob in a record. Then, the URL switched to serving a image/jpeg, Bluesky team's blob scanning fetched it, saw the type mismatch, and complained.

Not storing/hosting media has been convenient for us, for Reasons etc, but it's technically not ATProto compliant, since we can't guarantee that blobs are immutable, ie the URL we redirect getBlob requests to could serve a different image that doesn't match the CID and type we originally created the blob with.

cc @ericvolp12

snarfed commented 2 months ago

Specifically, the post that triggered this was:

Maybe the original image here was WEBP, and the JPEG is a downstream transcoding, and the CMS does that transcoding in the background, after the article is published, and serves the original image until the JPEG is ready? Maybe a bit of a stretch, but not too much? I dunno.

Here's our code for this:

https://github.com/snarfed/arroba/blob/b4e6911c4fefb3744f5ca19ec0614b7133194691/arroba/datastore_storage.py#L297-L331