Open freeformflow opened 9 months ago
Twitter's HTTP APIs intended for integration, v1 and v2, feature documentation and OAuth federated access control. After the summer 2023 policy shock, these APIs are no longer workable for Gobo. The OAuth access control mechanism is used to throttle whole application integrations; the free access level is restrictive; the paid tiers are beyond the resources of the Gobo project.
The twitter-api-client
Python library offers a workaround. Instead of using the integration APIs, this library accesses the API setup for the Twitter first-class client to operate. This is a GraphQL interface that prioritizes the information architecture of the Twitter client. It mixes a bundled graph of HTTP resources with directives meant for application code to find and hydrate. twitter-api-client
reverse engineered this API and mapped it into an interface that resembles the Twitter v1 and v2 APIs.
Based on reading the twitter-api-client
library documentation and code and looking over the GraphQL response, what the ticket asks is possible. In theory, it appears we would have the necessary components to support feed construction, cross-posting, and notifications.
Based on the patterns we've established with the other platform worker tasks, I can mostly see how to slot Twitter into Gobo's worker task plumbing. However, there are several issues that complicate the stability of an integration. The most serious of these is identity management.
Because we cannot use OAuth federated access control, we need to request broad credentials from people who wish to use Twitter with Gobo.
One form this could take would be username + password:
twitter-api-client
indicates that their ability to use username + password authentication is unstable. I'm not sure what that means, but they don't recommend this strategy.Instead, twitter-api-client
recommends using session cookies:
Using the GraphQL interface intended for the client bypasses the most severe rate limits, but there are still limitations. Responses from this API include x-rate-limit
headers we have established patterns to manage.
However, even for logged accounts, Twitter places limits on the total number of tweets. They indicate there's a limit of 2,400 per day, broken into smaller limits in effect over a number of hours. They are unfortunately vague on that last point.
Ideally, we'd want to respect this limit with some dynamic behavior from the worker. However, unlike the x-rate-limit
header protocol, I don't see any indication of quota bookkeeping that we can react to. Instead, we would need to maintain that bookkeeping ourself.
It also appears that twitter-api-client
is overly aggressive in fetching tweet feeds. Its design and defaults have scraping efficiency in mind, but given the above, we need something slightly different. So where the library would make a maximal request to Twitter (its default count is set at 1,000), we'd want to ask for smaller pages. That balances the HTTP rate limit with the tweet viewing limit.
But there are big tuning questions here; ones more complicated than the other platforms. Sometimes people on Twitter follow thousands of accounts. We can reduce the frequency of source fetches and spread the pull load across the quota of various accounts. But it's a balance we'd have to figure out.
The biggest takeaway is that we can't use twitter-api-client
as is for the medium term. I'd need to either modify an instance of twitter-api-client
after we load the library or use its design as a starting point to write something custom for Gobo.
This is perhaps less relevant, but I'll mention it in this section: Because Twitter imposes this per-account limit on viewed tweets each day, as Gobo reads tweets into its system on behalf of a person, we'd be eating into their quota. So unlike other platforms, we'd be precluding an integrator's ability to use the main Twitter client. Which is unfortunate.
The short version is that Twitter could unilaterally and summarily kill this integration. They could modify the session expiration. They could IP ban the Gobo worker. They could detect and reject requests that appear to not come from a client they control.
We can tread carefully around the rate limit restrictions and do our best to make requests that look sorta like they're coming from a client. But they can end it all and there won't be much to do about it. So, try to be detached. 😅
Investigate whether we can use this approach described by a UMass colleague: