[feature] cache a media attachment only after any local user has requested to load it

SmallPatatas commented 6 months ago

Is your feature request related to a problem ?

One of the most appealing things about GTS (to me at least) is how light on resources it can be.

However I haven't seen many people mention total transfer volume. Based on the current average tx rate, my single-user glitch-soc instance will transfer a total of ~150GB/month. Nowadays, that's fine where I live, but might make running an instance very costly in other parts of the world, or in rural/remote places.

Now, I definitely recognize that simply not caching any media is rude, and that it's bad for the network overall.

But I also think it's unfair to expect every other server to use up bandwidth to receive images that no one explicitly asked for, and might never even be seen by their users at all!

Describe the solution you'd like.

So, a proposal:

A server-level setting where remote attachments are not cached when the remote post is made, but are cached as soon as any local account chooses to view the attachment.

(I'll be honest, I don't know if it's possible to create the 'click-to-cache' function that I'm imagining, because I don't have expertise when it comes to APIs or compatibility considerations etc. So, apologies if this is all very naive!)

Perhaps it could be done a bit like akkoma's strategy, where attachments are fetched when a local account is actively loading their feed.

But unlike akkoma's method, where remote attachments are fetched directly to a user's browser/app, I'm proposing that the gts server would fetch the attachment from the remote server, forward it to the local user, and cache the media.

Describe alternatives you've considered.

Another hypothetical solution: could a 'click to cache' feature be done by using the 'view sensitive media' button on existing apps and front ends?

So,

user scrolls feed,
all media attachments are labeled 'sensitive'
user clicks to show attachment
media is fetched by the local server and cached

(the local gts server would obviously need to add an additional warning for attachments that are actually 'sensitive media').

Additional context.

Thanks very much for all your work :)

daenney commented 6 months ago

I think this would be a nice change to implement. It indeed seems a bit unnecessary to go out and fetch things that may never get displayed and it avoids the associated bandwidth and compute costs for both sides.

However, I think there's a few problems with what's being described here that we'll want to look at solving differently.

I'm not keen on overloading the meaning of "sensitive". It has a clear and understood meaning currently, and changing that to mean "or maybe it's just media we haven't fetched yet" under certain operating conditions is likely to cause confusion. We may be able to do something else, like a static placeholder image that we replace on-demand. That needs some thought to also make sure we don't run afoul of any caching of media the clients may do. When we fail get remote media attachments we already include a notice in the timeline, which may be another way of doing this.

Fetching the attachments on demand as we serve a timeline risks resulting in blocking the timeline endpoint. There's potentially tens of attachments we need to fetch, which could be reasonably big, from servers reasonably far away that are in that moment bandwidth or resource constrained. We probably want to do this in an asynchronous manner, and instead use something like the streaming endpoint to update statuses as media got fetched (assuming that's something we can do with the streaming endpoint in the first place). Here again we need to be mindful of any caching of the timeline the clients themselves may do, or it'll result in a very confusing experience.

SmallPatatas commented 6 months ago

Thanks for the insightful response.

Anecdotally: I've been experimenting with setting the Fedilab app (still w/ glitch-soc, but planning to move to gts) to not display images unless I click on them. And while it's obviously not the same experience as the 'click-to-cache' feature we're discussing, since the attachments are already on my server, I will say that my overall experience doesn't feel degraded. My feed might even feel a bit calmer, in fact. And I don't find myself clicking on the majority of 'load image' buttons.

If it helps keep the gears turning, here's a few other potential use cases for this type of feature, beyond the bandwidth considerations:

admins could use this during a spam wave, perhaps along with the existing gts spam filter, to avoid large quantities of image data being pushed onto their servers;
this could cut down on the impact of bad actors pushing violent or disturbing imagery into the DMs of racialized and/or queer fedizens;
what if 'click-to-cache' was part of an 'incremental' federation model, where a server that is not yet on an allow or deny list could only interact in certain limited ways, as opposed to simply following a standard allowlist/denylist dichotomy?

Again, my technical understanding is really, really lacking, so apologies in advance if any of this is completely unrealistic! Looking forward to hearing more of people's thoughts.

SmallPatatas commented 3 months ago

Having looked through more of the feature requests, it occurred to me that a partial solution to this would be allowing admins to set max sizes for each type of incoming media, as discussed in #2308, and use the media attachment placeholders implemented in #2331 for attachments over the max sizes.

If it were entirely up to me, I'd also love to be able to limit the size of remote avatars and headers, which can be surprisingly (and in my opinion, unnecessarily) large. Plus, I assume they're also being downloaded over and over again, if I'm clearing out media older than x days?

Major difference, if doing it the way #2308 proposes, would be that an attachment larger than this maximum, if attached to a DM or other non-public post, wouldn't be able to be viewed at all, if I understand it correctly.

daenney commented 3 months ago

Plus, I assume they're also being downloaded over and over again, if I'm clearing out media older than x days?

That's an incorrect assumption. When deleting media we check if it's an avatar/header image and if that belongs to an account this instance has seen during the last X days. If that's the case, the data is not deleted even if the avatar/header image itself is older than X days.

SmallPatatas commented 3 months ago

Ah ok, good to know! Thanks for the correction.

superseriousbusiness / gotosocial