superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.76k stars 322 forks source link

[feature] Import tweets from twitter-archive.zip #1381

Closed ynakao closed 8 months ago

ynakao commented 1 year ago

Is your feature request related to a problem ?

I want to migrate from Twitter to Fediverse completely, but at the same time I don't want to lose my twitter history.

Describe the solution you'd like.

Twitter lets users to download their twitter archive as twitter-archive.zip including tweets, uploaded media, account info, and so on. Please consider implementing tweet importing mechanism from the archive file to migrate from Twitter seamlessly.

Describe alternatives you've considered.

honk has this import feature, but it didn't work well at least on my local test as only a few tweets were imported. Also, honk is very barebone and opinionated (ex. no pagenation in frontend, no API support etc). And after all, I want to use GoToSocial! :)

There is a mirroring tool called pleroma-bot. It usually works as a daemon and syncs posts between Twitter and fediverse account, and has a feature importing tweets from the archive. I also tried this, but it didn't work well either as it does not support GoTosocial yet.

pleroma-bot may support GoToSocial fully in the future, however, since it posts tweets one by one using Mastodon API, it takes very long time to import all of them. So, it would be nice if GoToSocial could manage twitter-archive natively to speed up importing process.

Additional context.

It seems honk retains original tweet date, but pleroma-bot and other sync tools which use Mastodon API don't. As discussed in Mastodon repository (see also linked issues in the first comment), it may need to implement new API to import tweets as is.

Also, it needs to consider how to deal with retweets, replies, threads, and other unique features specific to Twitter when reflecting those tweets in GoToSocial side.

mirabilos commented 9 months ago

Do you want to backfill your account with old posts, or would it be sufficient to post all those anew? (RTs may be tricky or even unlawful, but reposting your own tweets and reconstructing threads might be possible.) For posting these anew, maybe someone could hack something external up that uses the normal client API to post (I already use a (hacked-up to support GtS better) python3-mastodon to bridge e.g. my CVS commit mails to Fedi).

Is that twitter-archive.zip still the thing with data/js/tweets/2014_09.js etc. in addition to a top-level tweets.csv? If so, I have one of these at hand and might have a look…

ynakao commented 9 months ago

I really appreciate you take a look at this issue!

Do you want to backfill your account with old posts, or would it be sufficient to post all those anew?

I'd like to backfill tweets as old posts if possible, because posting them as new ones can be done by pleroma-bot and other tools.

Is that twitter-archive.zip still the thing with data/js/tweets/2014_09.js etc. in addition to a top-level tweets.csv?

I just looked into my old twitter archives which were downloaded in 2020, 2022, and 2023, but every archive doesn't have data/js directory and tweets.csv file. It seems twitter archives changed its internal structure at some point?

Since other twitter caveats like this might exist, however, I'm starting to think this feature request will become maintenance burden for GtS developers in the future...

mirabilos commented 9 months ago

Yuji Nakao dixit:

I really appreciate you take a look at this issue!

Just a user, but thought, maybe I could help out.

Do you want to backfill your account with old posts, or would it be sufficient to post all those anew?

I'd like to backfill tweets as old posts if possible, because posting them as new ones can be done by pleroma-bot and other tools.

OK, makes sense.

That requires database fiddling and I’ve been told to not do that ;-)

It would, on the other hand, be great if I could send backdated posts with the python3-mastodon Mastodon().status_post(…) method. This would be useful for my RSS feed, CVS commit eMail, etc. to Fediverse bridges, too, to have matching timestamps. (The posts should still be announced to followers, though.)

Is that twitter-archive.zip still the thing with data/js/tweets/2014_09.js etc. in addition to a top-level tweets.csv?

I just looked into my old twitter archives which were downloaded in 2020, 2022, and 2023, but every archive doesn't have data/js directory and tweets.csv file. It seems twitter archives changed its internal structure at some point?

Oh, ouch. That’s even more annoying then.

Since other twitter caveats like this might exist, however, I'm starting to think this feature request will become maintenance burden for GtS developers in the future...

I think it’s pretty much out of GtS’ scope and should be an external utility. (But if GtS can help by providing APIs to do what needs to be done, then it’d be really good.)

I guess I maybe should request a more up-to-date archive and then see, but if you already have other nōn-backdating methods, I don’t think I should write anything that fiddles with the DB directly, so… did I not file a feature request for the post backdating? Should I?

bye, //mirabilos

tsmethurst commented 8 months ago

I'm gonna close this because I don't think importing twitter posts from an archive is something we're really interested in doing, and it brings a lot of technical headaches for not much payoff.

Can definitely understand wanting to host your old tweets somewhere, but it might be better to look around and see if there's some kind of static website generator that lets you create a mockup twitter feed on your website based on the contents of your archive.