ropensci / rtweet

🐦 R client for interacting with Twitter's [stream and REST] APIs
https://docs.ropensci.org/rtweet
Other
785 stars 201 forks source link

Use of V2 API #445

Closed sugarcane29 closed 1 year ago

sugarcane29 commented 4 years ago

Is there a possibility using this package with the Twitter V2 API. I'm trying to do some historical search and it appears that the V2 API allows for it in the free version.

Ref: start_time and end_time in the api: https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent

I tried changing the query parameters in the source and rebuilding the package, but it throwing lots of errors.

Thanks.

JNavelski commented 4 years ago

Hi, I am also trying to use rtweet with the V2 API from twitter (with a developer account), and for some reason I am having a difficult time getting historical data from my timeline. I would like to pull the last 30 days form my timeline, but I am still just getting the results I would get if I just pull without the developers account.

andypiper commented 4 years ago

There is no 30 day search or user timeline option in the Twitter API V2 yet but these are both coming soon. We are as excited as you are to see rtweet expand to support V2!

alexpghayes commented 3 years ago

I would be very happy to contribute v2 API functionality, especially now since it seems rtweet is back under active development. Can @mkearney, @llrs, and @hadley perhaps comment on the current governance strategy for rtweet? Are y'all the right people to talk to about this?

Also, it seems like the internals are undergoing a fairly extensive re-write at the moment, largely by @hadley. @hadley, do you plan on implementing v2 API infrastructure, or will you have to switch your attention other projects soon? If not, is there any dev-facing documentation about the new internals to read to get oriented, or is the appropriate place to start in the code itself?

llrs commented 3 years ago

Good questions @alexpghayes. I tried reaching @mkearney via several methods (twitter, github, email) but got no responses over several months. I requested to rOpenSci to maintain the package because I had some interest on new functionality and there were lots of issues and pending PR. @sckott gave me permissions like 2 weeks ago (2021/02/15), but we haven't talked about governance strategy, if anyone want to take over the maintenance for me it is ok.

Initially I wanted to solve the issues and later on start making more profound changes without breaking changes (see #471). But since @hadley contributed with a new internal process to call Twitter's API and rewritten some functions this will no longer be possible. He is now working (temporary?) on other projects (see comments on #526 and #523).

About the API v2 itself I haven't read much about it yet. As far as I know the path to the endpoint is different and now you can select which fields you want to get when making GET requests. You'll probably need to start modifying the TWIT_method to accept v2 endpoints and work from there (TWIT_GET and the new user facing function). There is no documentation of the internals. Hope this helps

hadley commented 3 years ago

I looked into the v2 api very little but I wondered if it might be better to pull the hard code version number out of TWIT_get() so it has to be specified in the call eg TWIT_get(token, “1/tweets/search”) (or whatever the end point is). If we go down that path, it would be good to decide now and fix all calls in a PR before starting more material work.

alexpghayes commented 3 years ago

My impression from a day of playing with v2 is that it is dramatically better and well worth investing in internal infrastructure to support v2. From a research perspective, it enables a lot. However, the return objects also seem be fundamentally different. I will read the v2 documentation more thoroughly tonight, but based on my current understanding we'd probably want entirely separate interfaces to the v2 endpoints and the v1.1 endpoints. I'm not sure how to approach this.

@hadley I can make a PR with a revision of TWIT_get() to support version specification and request a review when it's ready?

hadley commented 3 years ago

@alexpghayes yeah, that’d be great.

hadley commented 3 years ago

@alexpghayes once the PR is merged, I'd suggested by making a list of 1.1 vs 2 endpoints to figure out where we need to add behaviour vs change existing behaviour. It'll probably be easiest to start by adding functions for capabilities that are not available in v1.1, in order to get a sense of the API and what additional parameters the functions will need. I'd suggest picking one function to implement (maybe #363) then doing a PR to make a solid foundation for future work.

alexpghayes commented 3 years ago

Two key resources appear to be the migration guide: https://developer.twitter.com/en/docs/twitter-api/migrate and the data dictionary: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/using-fields-and-expansions. In particular, the data returned by v2 looks very different than data returned by v1. Presumably (but not definitively) we'll want to figure out a map from the new, more structured and nested JSON objects to a single line in a tibble for several different varieties of JSON object. Additionally, it looks like the API itself is moving from a "you get everything" model to a "explicitly request the data you want" model. There is then a whole new conceptual model for explicitly requesting the data you want with fancy arguments.

Based on the level of enthusiasm for https://github.com/cjbarrie/academictwitteR it seems like a good place to begin would be with the full archive search. I'll start on a PR this week.

alexpghayes commented 3 years ago

From the migration guide, the new API endpoints are:

Resource Endpoint group How it can be used Implemented
Tweets Tweets lookup Returns information about a Tweet or group of Tweets. [ ]
-- Recent search Returns Tweets over the last seven to nine days that match your query criteria. [ ]
-- Full-archive search Query the complete archive of public Tweets created since the first Tweet in March 2006.  This endpoint is currently only available with the Academic Research product track [ ]
-- User Tweet timeline Returns the Tweets composed by, or mentioning, a specified Twitter account. [ ]
-- User mention timeline Returns the Tweets mentioning a specified Twitter account. [ ]
-- Filtered stream Delivers Tweets which match your rules through a persistent HTTPS streaming connection. [ ]
-- Sampled stream Delivers about a 1% sample of all new public Tweets as they happen through a persistent HTTP streaming connection. [ ]
-- Hide replies Hides or unhides replies to Tweets that you or other authenticated users publish. [ ]
Users Users lookup Returns the profile information for a given user with the newly added ability to specify fields to be returned. [ ]
-- Follows lookup Retrieve an account’s followers and who they are following using their user ID. [ ]
-- Manage follows Follow or unfollow users using their user ID. [ ]
alexpghayes commented 3 years ago

I started on this at https://github.com/alexpghayes/rtweet/tree/v2 but the current bearer token interface is challenging enough to work with that I'm putting this on hold until #469 is resolved.

nikolassch commented 3 years ago

Dear All, thanks a lot for your work!!

Do you have a broad estimate when the usage of V2 API will be available? I would be particularly interested in searching by conversation_ids.

llrs commented 3 years ago

Probably this summer I'll make a sprint on rtweet (if it is not included by then, I would include v2 API). Then I would like to get some feedback and would leave some time before finally submitting to CRAN. The best estimate I have of a CRAN release with v2 support is before the end of the year.

Note that rtweet development version has a function to retrieve threads: tweet_threading.

If tweet_threading doesn't work and you want to have sooner support from rtweet, you can send pull requests and we can work on it. I think that @alexpghayes was also interested and started working on this.

alexpghayes commented 3 years ago

Now that #542 is merged v2 support is once again on my radar, but in classic academic fashion it is but one of many competing priorities at the moment.

sugarcane29 commented 3 years ago

I'm sorry if there are elementary questions, I'm only asking as I'd like to help in anyway I can.

Last year, after getting a response that V2 wasn't supported, I made a few changes to the rtweet internals and I got it working. (I had just changed the URLs that were being called from within rtweet.) I wasn't making complicated queries, but I didn't face any issues with search, trends (and I don't remember exactly, but maybe also tweet ids). The results were the same with the Node package that I shifted to for V2.

I've gone through the thread a couple of times and also the various other issues mentioned here, but I couldn't get a few things. Would be really nice if someone could clarify:

  1. Will just changing the URLs not be enough?
  2. Are we aiming for a more thorough overhaul?
  3. Will there be any significant differences between this and the academicTwitter package?
  4. How can I contribute? (I'm comfortable with R coding, but have never built any packages) (If my earlier code, the one mentioned above can be help, I'll dig through my history to find it.)

Thanks. Once again, apologies for asking some really basic questions.

hadley commented 3 years ago

@sugarcane29 it's not a one-to-one mapping from v1 to v2 because more has changed than just the url (e.g. you can now select which fields to include in the results). So this makes it an opportunity to revisit rtweet's API to make bigger changes. I think it would be a missed opportunity to just change the URLs and not reconsider the large API.

llrs commented 3 years ago

Hi @sugarcane29, many thanks for opening the issue and wanting to help. Help maintaining this package is always welcomed!

  1. Adding to Hadley's comments: Changing the URL will not be enough because API v2 does not provide all the functionality of API v1 (yet) and when it does, it does not return the same data or works the same.
  2. Yes, after 2 years without any maintenance or improvement, there were many bugs and it is on a process to make a big overhaul on rtweet spearheaded by Hadley. Besides, code for API v1 might not work well for v2 for the reasons mentioned above.
  3. I think that academicTwitter is meant to be a temporary solution until rtweet supports v2.
  4. Perhaps you have one endpoint that you wish rtweet supported. You can contribute with code with a pull request to add it. If it meets the overall design and quality of rtweet I will include the code on rtweet. However, due to recent work on rtweet, I don't think changes you made last year will would be up to date with current code.

Hope this helps to clarify your doubts.

alexpghayes commented 2 years ago

Just in case it wasn't abundantly clear from me being totally AWOL, v2 support is not on any critical research path for me at the moment and I am unlikely to be able to find any significant time to develop to v2 in the near future -- so please don't hold back on implementing stuff if you're excited about making this happen and were holding off on the basis of my comments above!

billmclellan commented 2 years ago

just coming into this discussion - anyone taking this up?

llrs commented 2 years ago

@billmclellan I'm currently doing other things. But certainly this would be very much welcomed. If you or anyone else want to start working on this I'll support and advise them to get APIv2 supported on rtweet.

billmclellan commented 2 years ago

I've not created a package before, but I've done lots of functions and scripts for my own analysis. Here's what I've started for myself using httr instead of rtweet, as suggested by Twitter for their v2 api. Am I on the right track? Snippet: get_user <- function(username, headers, params) { url_handle <- sprintf('https://api.twitter.com/2/users/by?usernames=%s', username) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers), query = params) obj <- httr::content(response, as = "text") json_data <- fromJSON(obj, flatten = TRUE) %>% as_tibble() } pragmantics.txt

llrs commented 2 years ago

@billmclellan This is a first step. The attached code is fine for a script but not for a package. The code to use twitter's API v2 should be inside the code of the package (you'll need to fork and modify/add files) and use internal functions of rtweet in order to be added (to ensure user consistency, for instance here you don't handle API rate limits or check users input). Look how it is done on API v1 and try to emulate this for v2.

billmclellan commented 2 years ago

thanks @llrs that's what I thought - thanks for confirming

brshallo commented 2 years ago

Link to Twitter's documentation on Getting started with R and v2 of the Twitter API (though note that expansion parameter needs to be changed to expansions or will get a 400 error). This looks like what @billmclellan may have used as a starting point.

JessicaGarson commented 2 years ago

@brshallo This was fixed a little while back on our end.

llrs commented 2 years ago

Thanks Jessica, for checking in and reporting the updates.

To everybody here: I hope in a couple of weeks to retake the development of rtweet (I've moved to the devel branch, if you don't see any change).

llrs commented 1 year ago

Update to all interested in this issue!

The latest rtweet release 1.1.0 supports the streaming endpoints. Next release will support the archive an recent tweets among others because I can easily support all the endpoints requiring OAuth 2.0 App Only (With the bearer token)

Unfortunately, I have some problems authenticating via Oauth 2.0 with PKCE, see this comment for a brief summary, which means no support for bookmarks yet (#344) and other endpoints requiring it. I don't want to add support for endpoints that could work via OAuth 1.0; I expect its support might end soon, although I might find a way to support it easily. See this table of endpoints and authentications to know other endpoints afected by this roadblock.

Some decisions I am facing in case someone wants to add their opinion:

Thanks all for you patience.

erima2020 commented 1 year ago

Hello, Thank you for this polling of preferences !

llrs commented 1 year ago

Thanks for sharing your preferences @erima2020 .

There is the expansions and the fields arguments to control this, users will need to set those to get all the data available. Which by default they only provide the minimal information requested. I'm thinking how to make it easier and more intuitive: Currently there is a way to get all data and get just the default. I might need to experiment a bit more about what is accepted by the API v2.

While I agree on maintaining the old output for old code is always nice (which I break in the 1.0.2 release), the current parsing of the data to generate the output is slow (imho) and forces for instance, a user interested in media to retrieve everything. Given that old code will be updated I think it is a good opportunity to provide faster and better interface for the users (or completely abandon the package :/).

As discussed privately the search_tweets is not in risk to get deprecated by the API. But I expect it will be easy to support, and might be available soon via the v2 in rtweet. I only mentioned those that were not available with API v1.1 (although I'm missing, some see https://github.com/ropensci/rtweet/labels/API%20v2) as I will focus on at least maintain current functionality via the v2.

llrs commented 1 year ago

Now (since version 1.1 for the streaming endpoints) it is possible to use the API v2. You can retrieve data from all the endpoints but rtweet currently only allows to manage tweets (POST and DELETE) and not likes or lists or similar actions.