Closed sugarcane29 closed 1 year ago
Hi, I am also trying to use rtweet with the V2 API from twitter (with a developer account), and for some reason I am having a difficult time getting historical data from my timeline. I would like to pull the last 30 days form my timeline, but I am still just getting the results I would get if I just pull without the developers account.
There is no 30 day search or user timeline option in the Twitter API V2 yet but these are both coming soon. We are as excited as you are to see rtweet expand to support V2!
I would be very happy to contribute v2 API functionality, especially now since it seems rtweet
is back under active development. Can @mkearney, @llrs, and @hadley perhaps comment on the current governance strategy for rtweet
? Are y'all the right people to talk to about this?
Also, it seems like the internals are undergoing a fairly extensive re-write at the moment, largely by @hadley. @hadley, do you plan on implementing v2 API infrastructure, or will you have to switch your attention other projects soon? If not, is there any dev-facing documentation about the new internals to read to get oriented, or is the appropriate place to start in the code itself?
Good questions @alexpghayes. I tried reaching @mkearney via several methods (twitter, github, email) but got no responses over several months. I requested to rOpenSci to maintain the package because I had some interest on new functionality and there were lots of issues and pending PR. @sckott gave me permissions like 2 weeks ago (2021/02/15), but we haven't talked about governance strategy, if anyone want to take over the maintenance for me it is ok.
Initially I wanted to solve the issues and later on start making more profound changes without breaking changes (see #471). But since @hadley contributed with a new internal process to call Twitter's API and rewritten some functions this will no longer be possible. He is now working (temporary?) on other projects (see comments on #526 and #523).
About the API v2 itself I haven't read much about it yet. As far as I know the path to the endpoint is different and now you can select which fields you want to get when making GET requests. You'll probably need to start modifying the TWIT_method to accept v2 endpoints and work from there (TWIT_GET and the new user facing function). There is no documentation of the internals. Hope this helps
I looked into the v2 api very little but I wondered if it might be better to pull the hard code version number out of TWIT_get()
so it has to be specified in the call eg TWIT_get(token, “1/tweets/search”)
(or whatever the end point is). If we go down that path, it would be good to decide now and fix all calls in a PR before starting more material work.
My impression from a day of playing with v2
is that it is dramatically better and well worth investing in internal infrastructure to support v2
. From a research perspective, it enables a lot. However, the return objects also seem be fundamentally different. I will read the v2
documentation more thoroughly tonight, but based on my current understanding we'd probably want entirely separate interfaces to the v2
endpoints and the v1.1
endpoints. I'm not sure how to approach this.
@hadley I can make a PR with a revision of TWIT_get()
to support version specification and request a review when it's ready?
@alexpghayes yeah, that’d be great.
@alexpghayes once the PR is merged, I'd suggested by making a list of 1.1 vs 2 endpoints to figure out where we need to add behaviour vs change existing behaviour. It'll probably be easiest to start by adding functions for capabilities that are not available in v1.1, in order to get a sense of the API and what additional parameters the functions will need. I'd suggest picking one function to implement (maybe #363) then doing a PR to make a solid foundation for future work.
Two key resources appear to be the migration guide: https://developer.twitter.com/en/docs/twitter-api/migrate and the data dictionary: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/using-fields-and-expansions. In particular, the data returned by v2
looks very different than data returned by v1
. Presumably (but not definitively) we'll want to figure out a map from the new, more structured and nested JSON objects to a single line in a tibble for several different varieties of JSON object. Additionally, it looks like the API itself is moving from a "you get everything" model to a "explicitly request the data you want" model. There is then a whole new conceptual model for explicitly requesting the data you want with fancy arguments.
Based on the level of enthusiasm for https://github.com/cjbarrie/academictwitteR it seems like a good place to begin would be with the full archive search. I'll start on a PR this week.
From the migration guide, the new API endpoints are:
Resource | Endpoint group | How it can be used | Implemented |
---|---|---|---|
Tweets | Tweets lookup | Returns information about a Tweet or group of Tweets. | [ ] |
-- | Recent search | Returns Tweets over the last seven to nine days that match your query criteria. | [ ] |
-- | Full-archive search | Query the complete archive of public Tweets created since the first Tweet in March 2006. This endpoint is currently only available with the Academic Research product track | [ ] |
-- | User Tweet timeline | Returns the Tweets composed by, or mentioning, a specified Twitter account. | [ ] |
-- | User mention timeline | Returns the Tweets mentioning a specified Twitter account. | [ ] |
-- | Filtered stream | Delivers Tweets which match your rules through a persistent HTTPS streaming connection. | [ ] |
-- | Sampled stream | Delivers about a 1% sample of all new public Tweets as they happen through a persistent HTTP streaming connection. | [ ] |
-- | Hide replies | Hides or unhides replies to Tweets that you or other authenticated users publish. | [ ] |
Users | Users lookup | Returns the profile information for a given user with the newly added ability to specify fields to be returned. | [ ] |
-- | Follows lookup | Retrieve an account’s followers and who they are following using their user ID. | [ ] |
-- | Manage follows | Follow or unfollow users using their user ID. | [ ] |
I started on this at https://github.com/alexpghayes/rtweet/tree/v2 but the current bearer token interface is challenging enough to work with that I'm putting this on hold until #469 is resolved.
Dear All, thanks a lot for your work!!
Do you have a broad estimate when the usage of V2 API will be available? I would be particularly interested in searching by conversation_ids.
Probably this summer I'll make a sprint on rtweet (if it is not included by then, I would include v2 API). Then I would like to get some feedback and would leave some time before finally submitting to CRAN. The best estimate I have of a CRAN release with v2 support is before the end of the year.
Note that rtweet development version has a function to retrieve threads: tweet_threading.
If tweet_threading doesn't work and you want to have sooner support from rtweet, you can send pull requests and we can work on it. I think that @alexpghayes was also interested and started working on this.
Now that #542 is merged v2
support is once again on my radar, but in classic academic fashion it is but one of many competing priorities at the moment.
I'm sorry if there are elementary questions, I'm only asking as I'd like to help in anyway I can.
Last year, after getting a response that V2 wasn't supported, I made a few changes to the rtweet internals and I got it working. (I had just changed the URLs that were being called from within rtweet.) I wasn't making complicated queries, but I didn't face any issues with search, trends (and I don't remember exactly, but maybe also tweet ids). The results were the same with the Node package that I shifted to for V2.
I've gone through the thread a couple of times and also the various other issues mentioned here, but I couldn't get a few things. Would be really nice if someone could clarify:
Thanks. Once again, apologies for asking some really basic questions.
@sugarcane29 it's not a one-to-one mapping from v1 to v2 because more has changed than just the url (e.g. you can now select which fields to include in the results). So this makes it an opportunity to revisit rtweet's API to make bigger changes. I think it would be a missed opportunity to just change the URLs and not reconsider the large API.
Hi @sugarcane29, many thanks for opening the issue and wanting to help. Help maintaining this package is always welcomed!
Hope this helps to clarify your doubts.
Just in case it wasn't abundantly clear from me being totally AWOL, v2
support is not on any critical research path for me at the moment and I am unlikely to be able to find any significant time to develop to v2
in the near future -- so please don't hold back on implementing stuff if you're excited about making this happen and were holding off on the basis of my comments above!
just coming into this discussion - anyone taking this up?
@billmclellan I'm currently doing other things. But certainly this would be very much welcomed. If you or anyone else want to start working on this I'll support and advise them to get APIv2 supported on rtweet.
I've not created a package before, but I've done lots of functions and scripts for my own analysis. Here's what I've started for myself using httr instead of rtweet, as suggested by Twitter for their v2 api. Am I on the right track? Snippet: get_user <- function(username, headers, params) { url_handle <- sprintf('https://api.twitter.com/2/users/by?usernames=%s', username) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers), query = params) obj <- httr::content(response, as = "text") json_data <- fromJSON(obj, flatten = TRUE) %>% as_tibble() } pragmantics.txt
@billmclellan This is a first step. The attached code is fine for a script but not for a package. The code to use twitter's API v2 should be inside the code of the package (you'll need to fork and modify/add files) and use internal functions of rtweet in order to be added (to ensure user consistency, for instance here you don't handle API rate limits or check users input). Look how it is done on API v1 and try to emulate this for v2.
thanks @llrs that's what I thought - thanks for confirming
Link to Twitter's documentation on Getting started with R and v2 of the Twitter API (though note that expansion
parameter needs to be changed to expansions
or will get a 400 error). This looks like what @billmclellan may have used as a starting point.
@brshallo This was fixed a little while back on our end.
Thanks Jessica, for checking in and reporting the updates.
To everybody here: I hope in a couple of weeks to retake the development of rtweet (I've moved to the devel branch, if you don't see any change).
Update to all interested in this issue!
The latest rtweet release 1.1.0 supports the streaming endpoints. Next release will support the archive an recent tweets among others because I can easily support all the endpoints requiring OAuth 2.0 App Only (With the bearer token)
Unfortunately, I have some problems authenticating via Oauth 2.0 with PKCE, see this comment for a brief summary, which means no support for bookmarks yet (#344) and other endpoints requiring it. I don't want to add support for endpoints that could work via OAuth 1.0; I expect its support might end soon, although I might find a way to support it easily. See this table of endpoints and authentications to know other endpoints afected by this roadblock.
Some decisions I am facing in case someone wants to add their opinion:
Would you prefer to get all the data as the previous API did? The new API the functions by default only provide the bare minimum information requested. Currently rtweet returns the minimal data with an easy way to get all the data via the fields and expansions. The API to set expansions and fields might change, as it isn't intuitive and idomatic.
Do you prefer to mimic the old data or provide new outputs structures? The API output is different and more flexible, parsing it will be different too and currently not performed. I will try to keep the new helpers (ids
, rbind
, entity
, ...) working even if the new endpoints return a different output .
Do you have preferences for an endpoint? Currently there is support in the API v2 for the streaming endpoints because they stop working with API v1. Besides the bookmark endpoint, I will focus on supporting the compliance jobs, so that any user can check if they need to delete stored tweets and user info. But there are a lot more, see this guide for mappings between API versions. There is already support for the searching in the archive in the devel branch if one is ready for a wild rodeo and have academic research access you can use search_archive
(name might change see discussions in #480)
Thanks all for you patience.
Hello, Thank you for this polling of preferences !
Would you prefer to get all the data as the previous API did? The new API the functions by default only provide the bare minimum information requested. Currently rtweet returns the minimal data with an easy way to get all the data via the fields and expansions. The API to set expansions and fields might change, as it isn't intuitive and idomatic.
I would prefer as an option (e.g., with argument complete = TRUE) to have the same fields as in the previous API, which would simplify code on my end, if that is possible, and maybe the minimum information as a default.
Do you prefer to mimic the old data or provide new outputs structures? The API output is different and more flexible, parsing it will be different too and currently not performed. I will try to keep the new helpers (ids, rbind, entity, ...) working even if the new endpoints return a different output .
Ideally it would be good to have both options (the current and the more flexible structure)
Do you have preferences for an endpoint? Currently there is support in the API v2 for the streaming endpoints because they stop working with API v1. Besides the bookmark endpoint, I will focus on supporting the compliance jobs, so that any user can check if they need to delete stored tweets and user info. But there are a lot more, see this guide for mappings between API versions. There is already support for the searching in the archive in the devel branch if one is ready for a wild rodeo and have academic research access you can use search_archive (name might change see discussions in https://github.com/ropensci/rtweet/issues/480)
In priority, I would like an update on the search_tweets endpoint. The timelines would also be of interest. Best wishes, Eric
Thanks for sharing your preferences @erima2020 .
There is the expansions
and the fields
arguments to control this, users will need to set those to get all the data available. Which by default they only provide the minimal information requested. I'm thinking how to make it easier and more intuitive: Currently there is a way to get all data and get just the default. I might need to experiment a bit more about what is accepted by the API v2.
While I agree on maintaining the old output for old code is always nice (which I break in the 1.0.2 release), the current parsing of the data to generate the output is slow (imho) and forces for instance, a user interested in media to retrieve everything. Given that old code will be updated I think it is a good opportunity to provide faster and better interface for the users (or completely abandon the package :/).
As discussed privately the search_tweets
is not in risk to get deprecated by the API. But I expect it will be easy to support, and might be available soon via the v2 in rtweet. I only mentioned those that were not available with API v1.1 (although I'm missing, some see https://github.com/ropensci/rtweet/labels/API%20v2) as I will focus on at least maintain current functionality via the v2.
Now (since version 1.1 for the streaming endpoints) it is possible to use the API v2. You can retrieve data from all the endpoints but rtweet currently only allows to manage tweets (POST and DELETE) and not likes or lists or similar actions.
Is there a possibility using this package with the Twitter V2 API. I'm trying to do some historical search and it appears that the V2 API allows for it in the free version.
Ref: start_time and end_time in the api: https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent
I tried changing the query parameters in the source and rebuilding the package, but it throwing lots of errors.
Thanks.