superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.6k stars 301 forks source link

[feature] Database Migration Path for Mastodon => GoToSocial #128

Open tsmethurst opened 2 years ago

tsmethurst commented 2 years ago

It should be possible for a Mastodon database to be migrated to a GoToSocial one, since the underlying models are not really dissimilar (the GtS database model is based loosely on the Mastodon one after all).

A plan for this (imo, but I'm very open to debate) might look something like this:

Mastodon => GoToSocial Migration Plan

The most direct way of migrating a whole instance from Mastodon => GoToSocial would be to convert the Mastodon database into a format that GoToSocial can understand + use. So this plan focuses on that. The plan assumes that some kind of command like gotosocial database migrate ... would be added to the GtS command line tool.

Requirements

What do we need to keep, what can we afford to lose.

Crucial

  1. Keep host name and account domain of the instance the same.
  2. Migrate local account data and user data from masto => gts.
  3. Migrate following and followers from masto => gts.
  4. Migrate account public + private keys from masto => gts so remote servers aren't confused as hell when the keys change.
  5. Migrate domain blocks, suspensions, and personal blocks from masto => gts.

Nice-to-have but not sure if crucial

  1. Migrate local acount posts from masto => gts.
  2. Migrate local account media (images, videos, etc) from masto => gts.
  3. Migrate instance settings from masto => gts.
  4. Migrate account preferences from masto => gts.

Probably not necessary

  1. Migrate remote account entries, posts, media. This is probably not necessary because the server can just populate them again moving onwards, assuming they're still available on the remote servers.

Possible fucker-uppers

Obviously in a big migration like this there are gonna be problems we can foresee and problems we can't. Here are some possible issues.

URLs + Paths

ActivityPub IDs are URLs. GoToSocial uses a slightly different template from Mastodon for generating URLs for everything from follow requests, to posts, to account URLs (inbox, following, followers, likes, etc). We have to go through these one by one and make a comparison of what would change and what would stay the same, and figure out how remote servers would cope with that. Otherwise, there's a real risk of totally breaking federation for an account, if the remote servers are still trying to access an account's resources on paths that GoToSocial can't recognize or serve from.

Implementation Ideas

Some ideas for how to actually implement this monster.

Parallel Databases

Let's say Mastodon is running with a Postgres server, with a database called mastodon.

One way of doing the migration would be to have the gotosocial database migrate ... command create a new database on the server called gotosocial.

Then, the tool would parse through entries in mastodon, convert them, and stick them in gotosocial. This would leave the mastodon database intact, in case the database admin wanted to archive it or whatever.

After the migration, Mastodon could be stopped, and GoToSocial could be started.

Advantages

Preserve info in the Mastodon database in case something goes wrong during migration, or the instance admin just changes their mind and wants to go back to Mastodon.

Drawbacks

This is gonna be really intensive on the machine running the migration -- we'd have to throw a lot of CPU and memory at the problem, not to mention disk space.

To be continued....

tsmethurst commented 1 year ago

See also https://github.com/superseriousbusiness/gotosocial/issues/928

jwbjnwolf commented 1 year ago

With Mastodon 4.0 completely doing away with the static html public ui, and that the web ui for logged out users is COMPLETELY useless with authorized fetches enabled, I would absolutely like to be able to do a migration to GoToSocial asap.

I would if it was all just public accounts, just use wget to download and store a static archive of the instance and start new, and move followers, but I run a private account circle on it, so that would be all lost.

To show how ridiculous it is this new Mastodon update.. here's my profile on my instance, and here's my profile on my test instance.. I'm just so mad and upset.. and I can't stay on 3.5.3 forever, especially that I'm on glitch which is always the latest changes so if I need to reinstall.. I'm done for...

And yea the posts and media is crucial, maybe not for everyone but 100% for me.

Edit: Got a fork setup now for 3.5.3 of Glitch with the most recent changes before switching to v4.0, so I'm not done for upon a reinstall now, but yea.. as soon as GoToSocial is where it needs to be to provide migration functionality, I'm jumping right over.

progval commented 1 year ago

@jwbjnwolf For what it's worth, Mastodon is going to fix this issue by allowing public API access despite AUTHORIZED_FETCH in the next RC: https://github.com/mastodon/mastodon/pull/19803

norayr commented 7 months ago

i am refreshing this page from time to time because i am eager to replace mastodon with gotosocial, and i am waiting for solution. i will migrate even if it will be only possible years later.

i think i am not alone. many mastodon administrators are fed up with different js and ruby issues, plus they have lots of performance issues.

we need something as simple and fast as gotosocial, so if migration be possible one day, i expect many instances to move away from mastodon.

at least my friends who run instances will be happy to migrate.

ShadowJonathan commented 7 months ago

a note: it's probably required for GtS to extrapolate from database entries in mastodon certain things such as scheduled posts, Poll closure notices, and other scheduled events, as mastodon stores these in the redis queues

breuxi commented 3 months ago

I would also be happy to migrate my instance which I share with a few friends as soon as possible, because it is growing indefinitely thanks to their database and media cache choices. 😭

ShadowJonathan commented 3 months ago

you can run pruning database tasks;

bundle exec bin/tootctl media remove --days 30 --concurrency 100
bundle exec bin/tootctl media remove-orphans
bundle exec bin/tootctl media remove --prune-profiles -c 25
bundle exec bin/tootctl cache clear
breuxi commented 3 months ago

@ShadowJonathan Yeah, thank you! Since running these (as a cron job) and reducing the number of days down to 14, I am finally down to about 10GB in file and 3GB in database storage. I still don't know if that's optimal for about 4 accounts with only one of them quite active, but at least better than the 60GB it took once before. 😂

tsmethurst commented 3 months ago

Please try to keep comments in this thread on-topic re: technical implementation of database migration path. I'm aware people are frustrated with Mastodon for various reasons, but this isn't the place to vent that.

tsmethurst commented 3 months ago

Also worth mentioning that Move (ie., account migration) will be finalized in version 0.15.0 of GoToSocial, which is currently in the release candidate stage. It's not a database migration path, but with Move you will be able to Move your account's followers on an account on a different instance (eg., a GoToSocial account), much as you can now Move between Mastodon instances (and other software types).