Closed micahflee closed 1 year ago
Thanks for this Micah. There's already a merged PR (#23) to import a downloaded Twitter archive, if you're unaware.
Also, I don't know if it's still the case, but about a year ago it was still possible to request access to the 1.1 API, even if they only gave you access to 2.0 by default.
I think maybe it's worth testing the archive workflow (but need to verify if they have deletion limit). I don't think 1500 tweets really help anyone. But also maybe instead of downloading all tweets, we can download only tweets that match the settings for deletion. 🤔 A part of me also wonder if there is some way to web scrape the tweets instead of API, but I feel twitter will stop that very quickly.
The other option is basically just to convert the code to use API v2, and then kind of leave the project, mentioning that anyone who wants to use can still use it (in case someone thinks that 1500 limit is fine for them), but we are not adding any new features or maintaining it, but can probably review PRs if others want to contribute changes to keep up with the API definitions.
I feel if API v2 is the only way ahead, then unless the code to use v2 is overly different and complicated, we can just do that and then merge and leave the project (probably archive?). But if downloading the archive and importing it into the platform and then doing deletion works, I think that is probably a fine workflow as well.
I could test with an archive with about ~17k tweets. I want to delete most of it anyway but never worked with python and I'm on windows.
@micahflee will re-visit this soon to help all of us get a working copy running locally.Will learn Python as we go. FYI, the old standalone version was working on the old API, but would/could only unretweet - couldn't debug deleting tweets. I want to contribute, but I would like/need to work with you/pairing for maximum effectiveness
I'm working on completely reworking the open source semiphemeral project to have all of the functionality of the hosted app Semiphemeral.com. Rather than refactoring the open source version, I'm basically starting over and copying chunks of the Semiphemeral.com code into it and then updating it to work for just a single user.
This is a work-in-progress PR.
Set up a dev environment
There's now a
BUILD.md
with instructions on getting started. You need Python 3 and Node.js installed. Install dependencies like this:All the frontend code is in
semiphemeral/frontend
and you can build it by running this (you need to re-run it each time you edit any frontend code):You can start the Semiphemeral server by running:
During development I just run
poetry run build && poetry run semiphemeral
. When I make changes to any of the code I press CTRL-C and just run the same thing again.All data is stored in
~/.semiphemeral
. The settings (including Twitter API creds) are stored insettings.json
, and then everything else is in a sqlite3 database calleddata.db
.How it works so far
When you first run, before you have configured it with your API credentials, it gives you step-by-step instructions on how to get those credentials from the Twitter Developer Portal. You can't proceed until you provide the correct API creds and test that they work:
After you give valid API credentials, the rest of the app is unlocked and it redirects you to the Settings page:
You can then choose your settings, and go to the Dashboard.
I've started implementing logic that requires you to download twitter data first, and delete later. I'm not done with this part yet, and this is where I want help from others. First let me explain a few things.
How jobs work
This is a flask app (with socketio), but there's also a background thread for the
run_jobs
function in the background:run_jobs
is an infinite loop that selects all of the pending jobs from the database and runs them one at a time. During a job, it pushes progress updates to the client via socketio.When you click "Download Twitter Data" the flask route just adds a pending download job to the database. In the job runner thread, it selects that job and then runs it. Here's the
download
job function: https://github.com/micahflee/semiphemeral/blob/modernize/semiphemeral/jobs.py#L78-L290It starts by create a Twitter API v1.1 client and verifying that the Twitter creds are accurate, then goes on to start download the entire history of tweets and likes, pushing updates the client the whole time:
However, when I run it, it hits an exception on tweepy.API.user_timeline:
To explain this, lemme go into the difference between the Twitter API v1.1 and v2.
Twitter API v1.1 and v2
Twitter has two different APIs, the modern v2 API and the older legacy v1.1 API. Semiphemeral at the moment only uses the the v1.1 API, however there's code in
common.py
that lets you create an API client using either API of your choice. Thecreate_tweepy_client_v2
makes a v2 client, andcreate_tweepy_client_v1_1
makes a v1.1 client.All the Twitter API code uses Tweepy. Check out the Tweepy docs -- on the left there are two separate sections for "Twitter API v1.1 Reference" and "Twitter API v2 Reference". Make sure you look at the correct docs for the API that you're using.
API v2 is a bit nicer and more modern than v1.1, however Twitter started aggressively rate limiting everything in the v2 API. Awhile back I had actually updated all of the Semiphemeral.com code to use the v2 API but then quickly hit my limit on the number of tweets I could download, so I switched it all back to the v1.1 API since that has no such limit.
However, it looks like when you create a brand new Twitter app and get new free API credentials, these creds don't support the API v1.1 endpoints that we need to download tweets. (I think the Semiphemeral.com API creds are kind of grandfathered in and still somewhat work, but not new ones.) So the only way forward is to use API v2.
This is where I need help, and also is it worth it at all?
I've already deleted all of my tweets, and none of my burner Twitter accounts have any tweets in them. So I actually don't have access to an account to very thoroughly test the download code even if I did rewrite it to use API v2.
And when I go to https://developer.twitter.com/en/portal/product (logged into the Twitter account I made my API creds with) I can see that API v2 has a limit of pulling 1,500 tweets a month. That's like, a very small number.
Let's say @torproject wanted to delete its old tweets. That account has 12.7K tweets, which honestly is on the low end for a Twitter account that's been around for a long time. That would take over 8 months just to download the tweets in order to delete them, lol.
Is this project, or making a self-hosted only Semiphemeral, even worth it anymore?
Anyway, I think this is where I've hit my limit on what I can do. I'd like help with maybe trying to convert this download job code from API v1.1 to API v2 and test it to make sure it works. But with such a small monthly cap, maybe this is all a waste of time.
My only other idea is relying on the downloadable Twitter archive-- the user can download their archive and then upload it to Semiphemeral to delete those tweets and likes. I don't know if API v2 would hit other deletion caps though, and again this isn't something I can easily do without access to a real Twitter account full of data.