Open snarfed opened 1 year ago
Deprioritizing. Shipped remote blobs w/datastore_storage.AtpRemoteBlob
for generating blobs for externally hosted files, which is working well enough for my needs.
This came up again recently: Bridgy Fed hit a case where (our best guess is) an image URL originally served one format, image/webp
, and then later switched to serving an image/webp
. We fetched the first image, saw image/webp
, stored that and the URL and image CID in an AtpRemoteBlob
, and populated that CID and mime type into a blob in a record. Then, the URL switched to serving a image/jpeg
, Bluesky team's blob scanning fetched it, saw the type mismatch, and complained.
Not storing/hosting media has been convenient for us, for Reasons etc, but it's technically not ATProto compliant, since we can't guarantee that blobs are immutable, ie the URL we redirect getBlob
requests to could serve a different image that doesn't match the CID and type we originally created the blob with.
cc @ericvolp12
Specifically, the post that triggered this was:
at://did:plc:kxwjbod4cqrkrk5okcgw7gx6/app.bsky.feed.post/3l2qy6dloi2w2
, commit bafyreifwwquqtdmmlfasatrjksudpfnryplidjwunvtl5g56if3rtk56bi
Content-Type: image/webp
Content-Length: 33092
bafkreihns5gebrgfbqlmzaono4fi3yrpaem3fjmtkhdx4qc2f5b6adbmjq
Content-Type: image/jpeg
Content-Length: 48225
bafkreib5qeer5wmetmazb4tbzuk6afb2a6djhgsqt6rvayiuszlppc3pi4
Maybe the original image here was WEBP, and the JPEG is a downstream transcoding, and the CMS does that transcoding in the background, after the article is published, and serves the original image until the JPEG is ready? Maybe a bit of a stretch, but not too much? I dunno.
Here's our code for this:
Not a priority for Bridgy Fed or me otherwise personally, but we should probably implement blob storage,
uploadBlob
/getBlob
, etc.