sqiouyilu / twitter-nest

Tools for creating a decentralized Twitter clone on WordPress.
GNU General Public License v3.0
8 stars 1 forks source link

Investigate Twitter data scraping options to recover own data #30

Open sqiouyilu opened 1 year ago

sqiouyilu commented 1 year ago

Your archive includes your content, but may become harder to access with platform instability.

Your archive DOES NOT INCLUDE a list of the accounts you’re following or that are following you. The only way to retrieve that is with paid services (unless I can find a free or command line option, but those are likely to be extremely not user-friendly for the average person trying to set up a nest).

Ethical concerns

It is possible to pay for someone else’s data, assuming they’re a public account. But although their Tweets may be public, they’re still content that they’ve authored and that they haven’t consented to having archived.

However, there is no way to prevent this behavior from users. An attempt to verify that a person is uploading their own Tweets, like forcing an oAuth login to check against Tweets, would prevent people from merging their own data across multiple usernames or from a deleted account.

The ethics of scraping someone’s publicly available profile data (display name, handle, ID, bio, URL) are also a bit gray. You could go and copy & paste everyone’s data manually into your own address book, and people generally wouldn’t have a problem with that. There’s really no way to control what people will scrape and import (or even that it has to come from Twitter). But the data should, as much as possible, be restricted to the backend, to prevent people from using nests as directories for targeted harassment.

Used

ExportData.io

FollowersAnalysis

Vicinitas

Not yet used

Scrapy

sqiouyilu commented 1 year ago

Investigate whether oAuth data scraping tools allow you to scrape data from protected accounts, and if so how much (test with own protected alt ONLY). Big security concern if it’s able to scrape more than just the public header profile information.