taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 579 forks source link

Scraping twitter threads? #265

Open shorouq-z opened 4 years ago

shorouq-z commented 4 years ago

Is there a way to scrape entire tweet threads instead of individual tweets? Twitter is now also rolling out new features for people to link their tweets in threads even if not tweeting them directly in a sequence. Is there any information recorded in the output file that could point to tweets being in the same thread, or even better, a way to directly scrape entire twitter thread if a certain keyword occurs anywhere within the tweets that make up that thread?

nukopy commented 4 years ago

I'm having same problem.

In the current implementation, we can't get thread. Now, We have no choice but to do the following(TERRIBLE way):

  1. Get tweets and get list of twitterscraper.tweet.Tweet object (hereinafter referred to as Tweet object).
  2. Extract 1 Tweet object from list.
  3. Extract parent's tweet id from Tweet object by its attribute Tweet.parent_tweet_id.
  4. Search parent tweet by tweet id and combine child tweet.
  5. repeat 1-4, finally we can get twitter thread.

By above procedure, of course, threads that contain tweets by protected account can NOT be extracted. However, this problem is due to Twitter's specs, so we have no choice.

shorouq-z commented 4 years ago

Hey, thanks for the reply!

I'm toying around with the code at the moment to see if that is also applicable for extracting consecutive tweets in a thread that are written by the same author (sounds ultra-specific but I'm interested in doing analysis on documents that are comprised longer blocks of text written by the same person). So if a person writes a tweet thread rant spanning from 1 to n tweets, then someone replies and the original poster replies back (and so on), then I'm only looking for those 1-n tweets and the rest of the thread (1) gets too complicated to extract blocks of comments from, and (d) probably does not interest me within the scope of this research.