hometown-fork / hometown

A supported fork of Mastodon that provides local posting and a wider range of content types.
GNU Affero General Public License v3.0
743 stars 55 forks source link

Allow tootctl to list all local media for smaller backups #1217

Closed rscmbbng closed 1 year ago

rscmbbng commented 1 year ago

Pitch

The official docs advise to back up public/system/cache to be able to successfully restore from backup. This requires lots and lots of disk to do so. The upstream project's stance is essentially to get more S3. However, this folder contains both local data and remote data. Remote data can in theory be fetched again, but local data can't. Having tootctl output just a list of local data would in theory allow for backups of just those parts (see https://github.com/mastodon/mastodon/issues/12910).

Combining that with a function where tootctl refetches remote data would allow for a drastically reduced cache to back up (see https://github.com/mastodon/mastodon/issues/16456).

Motivation

While it might be manageable for some now, just wait until something like Tumblr or Flickr join the fediverse and you start caching all of that! Additionally, doing this would allow people to save on server or storage rent.

dariusk commented 1 year ago

Can you link the statement in the official docs for reference?

rscmbbng commented 1 year ago

If you are using an external object storage provider such as Amazon S3, Google Cloud or Wasabi, then you don’t need to worry about backing those up. The respective companies are responsible for handling hardware failures.

If you are using local file storage, then it’s up to you to make copies of the sizeable public/system directory, where uploaded files are stored by default.

It is implicit that if you use S3 you don't need to worry (and just pay the growing bill).

https://docs.joinmastodon.org/admin/backups/#media

dariusk commented 1 year ago

Ah okay. I checked in with some other Mastodon project contributors and here's the thing:

According to one of the project devs over on the Mastodon discord:

what's in cache is in theory recoverable, but we have very poor tooling to recover it

So! We don't need the "list all local media" feature you propose, because you know already that it's everything not in cache (local media that you store the canonical versions of will be stored in public/system/media_attachments, public/system/accounts, etc etc). It is "safe" to just not back up cache as long as an acceptable disaster recovery scenario is that remote media will be broken links for a while. You should be able to run

tootctl media refresh --force

and that will then refetch all remote media ever, but... that's going to be a lot of media and probably not what you want. Ideally, Mastodon would include a --days=N on that command so you could reload the last N days of cache matching what your media remove does. You definitely want to do a

tootctl accounts refresh

in order to restore remote profile pics, and this might take a while but won't be a huge hard drive burden compared to media.

Anyway. Closing this issue because it wouldn't be a helpful feature (it would just tell us that everything remote that we don't care about lives in cache, and everything local that we need to back up lives in the non-cache directories). Feel free to keep commenting here if you have further questions though.

rscmbbng commented 1 year ago

Hi Darius, thanks for the extensive answer! Great to know the difference between cache and other folders. Isn't obvious from the docs. Then I will take this in to my backup strategy!

dariusk commented 1 year ago

@rscmbbng Reading over my comment I think I accidentally wrote the opposite of what I meant in some cases, but not all, so I clarified. The point is that cache contains the remote content you are storing and so technically it's only the stuff that's not in cache that is critical and if you delete it it can never be refetched.

rscmbbng commented 1 year ago

That was clear to me the first time around but thank you for clarifying and checking back!

jbenguira commented 1 year ago

@dariusk

I did exactly this, deleted that folder and then executed tootctl media refresh --force tootctl accounts refresh

It runned without error ... but I still don't have any headers/avatars ... anything else I should check/try?