twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.76k stars 2.72k forks source link

[QUESTION] Storing search results in a python object (twint module) #180

Closed aaeissa closed 6 years ago

aaeissa commented 6 years ago

Using the twint module, is there a way to store data in dicts or lists?

I know the data can be written to csv/json files with twint.run.Search(c), as well as printing to terminal.

I'm interested though, in creating a dict for example with certain fields, similar to how the documentation shows you how to format csv/json files with certain fields (e.g., id, username).

Alternatively I guess I could create the csv/json file and then immediately open/read the file, but wondering if there's a way to cut out this middle step.

pielco11 commented 6 years ago

Tweet Object

import twint

config = twint.Config()
config.Username = "pielco11"
config.Limit = 10 # to limit the number of results
config.Store_object = True # this is what you need

twint.run.Search(config)

Followers/Following object and User object

This feature is not completed yet, I'm working on it. Anyway there's something in dev branch, so it might change rapidly and it's quite unstable, feel free to give it a try and provide feedbacks if you want!

Not pythonic way

Nothings stops you from exporting to .json and then read the file, as you pointed (tip: if the file is quite big use iterators and stuff file .next())

aaeissa commented 6 years ago

Thanks. How can I access the data/tweets after twint.run.Search(config)? I consulted the module wiki again but didn't find any documentation for config.Store_object. Additionally, I attempted to save the Search results (data = twint.run.Search(config)) but that did not work. Apologies for the confusion.

pielco11 commented 6 years ago

Sorry my bad, maybe the doc is not complete. I'll give you now a full example:

import twint

config = twint.Config()
config.Username = "pielco11"
config.Limit = 10
config.Store_object = True
twint.run.Search(config)

# now you will have some tweets
tweets_as_objects = twint.output.tweets_object 

twint.output.tweets_objects is a list so tweets_as_objects[0] gives you the first tweet that you scraped. Fields of the "object" are accessed as attributes, so if tweet = tweets_as_objects[0] is the first tweet you scraped than tweet.id is the corresponding id. For a full list of attributes you can take a look here #custom-formatting-options

aaeissa commented 6 years ago

Exactly what I was looking for, thank you!

starcyber commented 5 years ago

I'm trying to do the same thing, however, I'm having the following result [<twint.tweet.tweet object at 0x7f22445b6c88>, <twint.tweet.tweet object at 0x7f22445b6cf8>, ...]. please, can you help me?

pielco11 commented 5 years ago

@starcyber that's correct.

To access a single tweet stored in the list just do tweets_as_objects[0] (or every other number between 0 and size-1.

You can see here the properties of the ŧweet object: https://github.com/twintproject/twint/blob/b85d18a1676756e16a461282a413a8340c90ddb5/twint/tweet.py#L64-L96

starcyber commented 5 years ago

@ pielco11 Thank you so much for your answer. But, I'd like to return a list with the scraped followers. For example: twitter = ["@ skinco11", "@ twitter", "@ NASA", ...]

Can you give an example? Working in module mode. Linux system!

pielco11 commented 5 years ago

@starcyber https://github.com/twintproject/twint/wiki/Module#save-data-tweets-users--in-lists-ram

starcyber commented 5 years ago

Where am I wrong?

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "import twint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "c = twint.Config()\n",
    "c.Username = 'noneprivacy'\n",
    "c.Limit = 10\n",
    "c.Store_object = True\n",
    "c.User_full = True"
   ]
  }
  {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3307205656 | SocialLinks | @_SocialLinks_ | Private: 0 | Verified: 0 | Bio: Maltego transforms, integration and data providing. | Location:  | Url: http://mtg-bi.com | Joined: 3 Jun 2015 8:35 AM | Tweets: 132 | Following: 120 | Followers: 294 | Likes: 18 | Media: 38 | Avatar: https://pbs.twimg.com/profile_images/691253601558237184/Log0GkWS_400x400.png\n",
      "262641807 | Ratan Jyoti | @reach2ratan | Private: 0 | Verified: 0 | Bio: #CyberSecurity #Researcher, #Leader, #influencer & #Author, #CISO @UjjivanSFB, #DataScience, #Blockchain Tweets my own  ▶️http://linkedin.com/in/ratanjyoti  | Location: Bangalore | Url: https://www.linkedin.com/in/ratanjyoti?trk=hp-identity-name | Joined: 8 Mar 2011 5:13 AM | Tweets: 33944 | Following: 7933 | Followers: 13929 | Likes: 26819 | Media: 12600 | Avatar: https://pbs.twimg.com/profile_images/860330414824542208/1XtEv7K0_400x400.jpg\n",
      "1082571861999960064 | FakeyBears | @BearzFakey | Private: 0 | Verified: 0 | Bio: gatherer of OSINT | Location:  | Url: None | Joined: 8 Jan 2019 1:36 AM | Tweets: 226 | Following: 76 | Followers: 22 | Likes: 35 | Media: 8 | Avatar: https://pbs.twimg.com/profile_images/1082592554070224896/4cSSW7th_400x400.jpg\n",
      "1353476761 | Maître Akua Kanzakaï | @KanzakaiSan | Private: 0 | Verified: 0 | Bio: 本当のスカムの偽の名前 | Location:  | Url: None | Joined: 14 Apr 2013 9:40 PM | Tweets: 521 | Following: 351 | Followers: 14 | Likes: 12 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/1118263438571184128/tMTYVjDs_400x400.jpg\n",
      "1527273618 | Wemanity | @Wemanity | Private: 0 | Verified: 0 | Bio: Community of passionate individuals whose purpose is to change the working world through #Agile, #innovation & #cooperation. We are the Agile Driving Force. 🚀✨ | Location: Paris - Brussels - The Hague - Luxembourg | Url: http://wemanity.com/ | Joined: 18 Jun 2013 2:43 AM | Tweets: 3147 | Following: 2571 | Followers: 6038 | Likes: 2435 | Media: 1693 | Avatar: https://pbs.twimg.com/profile_images/961968326548205568/e4aF03g0_400x400.jpg\n",
      "1117899398007095297 | osint.muffin | @MuffinOsint | Private: 0 | Verified: 0 | Bio: #OSINT #security #blockchain #BTC #AML | Location:  | Url: None | Joined: 15 Apr 2019 2:16 PM | Tweets: 1 | Following: 35 | Followers: 3 | Likes: 2 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/1117900083821973506/x9rpfD-u_400x400.jpg\n",
      "1059185567437598720 | GT | @Ginger__T | Private: 0 | Verified: 0 | Bio: OSINT & Privacy enthusiast. Liverpool fan and follower of F1. Fav movie Terminator 2 Judgement Day. Trying to help people stay safe on-line and learn as I go | Location: United Kingdom | Url: None | Joined: 4 Nov 2018 12:48 PM | Tweets: 221 | Following: 330 | Followers: 62 | Likes: 453 | Media: 4 | Avatar: https://pbs.twimg.com/profile_images/1114429035281625090/sWHCZ-3k_400x400.jpg\n",
      "15180137 | Chris Parker | @chrispcritters | Private: 0 | Verified: 0 | Bio: Founder https://WhatIsMyIPAddress.com  @wimia | Entrepreneur | Online Privacy, Safety & CyberSecurity | Podcast Guest | Website Monetization | InfoSec | Location: Tustin, CA | Url: https://www.cgparker.com | Joined: 20 Jun 2008 7:19 AM | Tweets: 1434 | Following: 17318 | Followers: 19807 | Likes: 37 | Media: 52 | Avatar: https://pbs.twimg.com/profile_images/972537147046506496/knlkgZC0_400x400.jpg\n",
      "3007871853 | NZ4R | @raznlalaj | Private: 1 | Verified: 0 | Bio: Learner. | Location:  | Url: None | Joined: 31 Jan 2015 8:01 AM | Tweets: 6 | Following: 833 | Followers: 55 | Likes: 1061 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/1067748856903991297/q2Z57OtS_400x400.jpg\n",
      "719821363 | b0bth3r00t | @b0bth3r00t | Private: 0 | Verified: 0 | Bio:  | Location:  | Url: None | Joined: 27 Jul 2012 2:24 AM | Tweets: 111 | Following: 583 | Followers: 30 | Likes: 2187 | Media: 0 | Avatar: https://abs.twimg.com/sticky/default_profile_images/default_profile_400x400.png\n",
      "1109487940827336705 | Ghos(in)t | @ghos_in | Private: 0 | Verified: 0 | Bio:  | Location:  | Url: None | Joined: 23 Mar 2019 9:11 AM | Tweets: 4 | Following: 76 | Followers: 0 | Likes: 15 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/1110938496104128512/1o1b31PU_400x400.png\n",
      "269009879 | ZenoIzbak | @ZenoIzbak | Private: 0 | Verified: 0 | Bio: ドラゴンヘッド 🇰🇲🇸🇩 | Location:  | Url: None | Joined: 19 Mar 2011 3:56 PM | Tweets: 4005 | Following: 365 | Followers: 7107 | Likes: 2431 | Media: 93 | Avatar: https://pbs.twimg.com/profile_images/1074850967055753217/0YJ-CPmy_400x400.jpg\n",
      "23233540 | Ali Tehrani | @Tehranix | Private: 0 | Verified: 0 | Bio: Detecting social media manipulation @astroscreenhq. @Techstars alum. | Location: London  | Url: None | Joined: 7 Mar 2009 12:57 PM | Tweets: 3787 | Following: 2349 | Followers: 3293 | Likes: 8969 | Media: 323 | Avatar: https://pbs.twimg.com/profile_images/968629211203174400/jAf3b1CL_400x400.jpg\n",
      "19638154 | Peter Allwright | @pallwright | Private: 1 | Verified: 0 | Bio: Forensic investigator | Location: Cape Town and London | Url: http://www.horizonforensics.co.za | Joined: 27 Jan 2009 9:36 PM | Tweets: 4992 | Following: 1353 | Followers: 1055 | Likes: 1212 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/741340186810974209/50549OlW_400x400.jpg\n",
      "994354984060891136 | Michael | @MickeyAbuYemen | Private: 0 | Verified: 0 | Bio:  | Location:  | Url: None | Joined: 9 May 2018 4:14 PM | Tweets: 11 | Following: 527 | Followers: 8 | Likes: 26 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/996467346351931392/tZaeef97_400x400.jpg\n",
      "856492987 | Elyse Samuels | @ElyseSamuels | Private: 0 | Verified: 0 | Bio: Video Editor at the Washington Post | Location:  | Url: https://www.washingtonpost.com/people/elyse-samuels/?utm_term=.06f849f35ae7 | Joined: 1 Oct 2012 7:46 AM | Tweets: 69 | Following: 613 | Followers: 173 | Likes: 9 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/484051380710027264/svxAqUmY_400x400.jpeg\n",
      "2162697578 | David Clarke | @1DavidClarke | Private: 0 | Verified: 0 | Bio: Consultant #DataProtection #Cybersecurity, Top 50 Global Expert  #KingstonCognate #Dataprivacy #GDPR Linkedin Group 17502 Members http://getgdpr.at/JoinNow  | Location: London, England | Url: http://getgdpr.at/dpia | Joined: 29 Oct 2013 4:48 AM | Tweets: 38311 | Following: 83626 | Followers: 86343 | Likes: 3553 | Media: 4456 | Avatar: https://pbs.twimg.com/profile_images/1066683980420972545/YI0b2EYm_400x400.jpg\n",
      "201935415 | Giovanni | @GiovMii | Private: 0 | Verified: 0 | Bio: Junior cyber security specialist // Tweeting about infoSec, privacy, photography and other geeky stuff // Oh yeah, tweets also may contain Gal Gadot ♥️ | Location: Montenegro, PG/UL | Url: None | Joined: 12 Oct 2010 4:29 PM | Tweets: 3061 | Following: 564 | Followers: 241 | Likes: 4412 | Media: 498 | Avatar: https://pbs.twimg.com/profile_images/1098126687710130176/RMsy5Ith_400x400.jpg\n",
      "17453126 | cyber osint grey geek | @cyber_osintgeek | Private: 1 | Verified: 0 | Bio: First generation OSINT. Enjoying sunny retirement in the Southern Kingdoms. First Generation info sec (floppy disk pentester). Lotus 123 coder | Location:  | Url: None | Joined: 17 Nov 2008 3:08 PM | Tweets: 1 | Following: 390 | Followers: 18 | Likes: 2 | Media: 0 | Avatar: https://pbs.twimg.com/profile_images/1116723419461095424/xg85p5_4_400x400.png\n"
  ],
   "source": [
    "twint.run.Followers(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "users = twint.output.user_object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<twint.user.user at 0x7fa6046b4c50>,\n",
       " <twint.user.user at 0x7fa6442604a8>,\n",
       " <twint.user.user at 0x7fa64c0def98>,\n",
       " <twint.user.user at 0x7fa6442602e8>,\n",
       " <twint.user.user at 0x7fa64c12acf8>,\n",
       " <twint.user.user at 0x7fa64d3f3e48>,\n",
       " <twint.user.user at 0x7fa6242daba8>,\n",
       " <twint.user.user at 0x7fa6046ba748>,\n",
       " <twint.user.user at 0x7fa64c2b2fd0>,\n",
       " <twint.user.user at 0x7fa644536438>,\n",
       " <twint.user.user at 0x7fa64c05c908>,\n",
       " <twint.user.user at 0x7fa6046a6358>,\n",
       " <twint.user.user at 0x7fa6046a6588>,\n",
       " <twint.user.user at 0x7fa604703e10>,\n",
       " <twint.user.user at 0x7fa6443d1748>,\n",
       " <twint.user.user at 0x7fa645f93080>,\n",
       " <twint.user.user at 0x7fa6443d0550>,\n",
       " <twint.user.user at 0x7fa64c1aa748>,\n",
       " <twint.user.user at 0x7fa644daf0f0>,\n",
       " <twint.user.user at 0x7fa604686470>]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
pielco11 commented 5 years ago

I guess that you are using a sort of an API, and for what I see there is something wrong with it and not Twint. Better say that you are handling the output in the wrong way

<twint.user.user at 0x7fa6442604a8> is an object with these properties

Nefarian-h commented 4 years ago

I am also using this code to save the obtained content in the list, but why I use the agreed code, it reports an error Here is log CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable Traceback (most recent call last): File "C:/new.py", line 10, in tweets_as_objects = twint.output.tweets_object AttributeError: module 'twint.output' has no attribute 'tweets_object'

Nefarian-h commented 4 years ago

import twint

config = twint.Config() config.Username = "pielco11" config.Limit = 10 config.Store_object = True twint.run.Search(config)

now you will have some tweets

tweets_as_objects = twint.output.tweets_object print(tweets_as_objects)

here is my code

toby-adedipe commented 4 years ago

In case anyone gets to the end of this thread like I did and is still getting the

AttributeError: module 'twint.output' has no attribute 'tweets_object'

twint.output.tweets_object was changed to twint.output.tweets_list and it works perfectly

See reference here https://github.com/twintproject/twint/issues/633#issuecomment-571162858

hearmeout2k commented 2 years ago

Can someone help me? I've been searching for a an actual example of how to output the tweet search result into a python object. I know that you're suppose to use twint.output.tweets_list but the results point to an address in memory? You need to iterate it but I don't really know how to.

hearmeout2k commented 2 years ago

basically i'm trying to get the tweet result into a string. a string of the actually tweet. if i do tweetz = twint.output.tweets_list and try to print it, tweetz[0], i'll get a print out of the memory address.