Investigate Twitter Support

Background

Twitter's HTTP APIs intended for integration, v1 and v2, feature documentation and OAuth federated access control. After the summer 2023 policy shock, these APIs are no longer workable for Gobo. The OAuth access control mechanism is used to throttle whole application integrations; the free access level is restrictive; the paid tiers are beyond the resources of the Gobo project.

The twitter-api-client Python library offers a workaround. Instead of using the integration APIs, this library accesses the API setup for the Twitter first-class client to operate. This is a GraphQL interface that prioritizes the information architecture of the Twitter client. It mixes a bundled graph of HTTP resources with directives meant for application code to find and hydrate. twitter-api-client reverse engineered this API and mapped it into an interface that resembles the Twitter v1 and v2 APIs.

Feasibility

Based on reading the twitter-api-client library documentation and code and looking over the GraphQL response, what the ticket asks is possible. In theory, it appears we would have the necessary components to support feed construction, cross-posting, and notifications.

Based on the patterns we've established with the other platform worker tasks, I can mostly see how to slot Twitter into Gobo's worker task plumbing. However, there are several issues that complicate the stability of an integration. The most serious of these is identity management.

Issues

HX for Twitter Identity Onboarding and Management

Because we cannot use OAuth federated access control, we need to request broad credentials from people who wish to use Twitter with Gobo.

One form this could take would be username + password:

This is the broadest from of access, so these are dangerous to keep around.
twitter-api-client indicates that their ability to use username + password authentication is unstable. I'm not sure what that means, but they don't recommend this strategy.

Instead, twitter-api-client recommends using session cookies:

These still have broad access permissions, so they're still dangerous to keep around.
They're hard to access. Based on my understanding of JavaScript interfaces for cookies, we cannot retrieve these values for Gobo members. Instead, we'd need to instruct members to access the cookies from the browser's dev console and provide them to us.
Fortunately, it appears the session cookies are valid for about a year. I'd need to confirm, but I saw the expiration date jump when I logged out and back in as a test. We might need to instruct potential integrators to do the same.
There does not appear to be a refresh mechanism. So we'd need some strategy for manually re-upping sessions. If the session cookies really last a year, this is less of a concern than I originally feared.

Rate Limits

Using the GraphQL interface intended for the client bypasses the most severe rate limits, but there are still limitations. Responses from this API include x-rate-limit headers we have established patterns to manage.

However, even for logged accounts, Twitter places limits on the total number of tweets. They indicate there's a limit of 2,400 per day, broken into smaller limits in effect over a number of hours. They are unfortunately vague on that last point.

Ideally, we'd want to respect this limit with some dynamic behavior from the worker. However, unlike the x-rate-limit header protocol, I don't see any indication of quota bookkeeping that we can react to. Instead, we would need to maintain that bookkeeping ourself.

It also appears that twitter-api-client is overly aggressive in fetching tweet feeds. Its design and defaults have scraping efficiency in mind, but given the above, we need something slightly different. So where the library would make a maximal request to Twitter (its default count is set at 1,000), we'd want to ask for smaller pages. That balances the HTTP rate limit with the tweet viewing limit.

But there are big tuning questions here; ones more complicated than the other platforms. Sometimes people on Twitter follow thousands of accounts. We can reduce the frequency of source fetches and spread the pull load across the quota of various accounts. But it's a balance we'd have to figure out.

The biggest takeaway is that we can't use twitter-api-client as is for the medium term. I'd need to either modify an instance of twitter-api-client after we load the library or use its design as a starting point to write something custom for Gobo.

This is perhaps less relevant, but I'll mention it in this section: Because Twitter imposes this per-account limit on viewed tweets each day, as Gobo reads tweets into its system on behalf of a person, we'd be eating into their quota. So unlike other platforms, we'd be precluding an integrator's ability to use the main Twitter client. Which is unfortunate.

Structural Risk

The short version is that Twitter could unilaterally and summarily kill this integration. They could modify the session expiration. They could IP ban the Gobo worker. They could detect and reject requests that appear to not come from a client they control.

We can tread carefully around the rate limit restrictions and do our best to make requests that look sorta like they're coming from a client. But they can end it all and there won't be much to do about it. So, try to be detached. 😅

iDPI-Umass / gobo-backend