twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.66k stars 2.72k forks source link

[QUESTION] How to catch internet connection off error and retry/continue when internet is back on? #620

Closed youssefavx closed 4 years ago

youssefavx commented 4 years ago

CRITICAL:root:twint.get:User:[Errno 54] Connection reset by peer twint.run.Search(z) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/run.py", line 292, in Search run(config, callback) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/run.py", line 213, in run get_event_loop().run_until_complete(Twint(config).main(callback)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete return future.result() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/run.py", line 154, in main await task File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/run.py", line 198, in run await self.tweets() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/run.py", line 137, in tweets await self.Feed() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/run.py", line 57, in Feed response = await get.RequestUrl(self.config, self.init, headers=[("User-Agent", self.user_agent)]) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/get.py", line 107, in RequestUrl response = await Request(_url, params=params, connector=_connector, headers=headers) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/get.py", line 157, in Request return await Response(session, url, params) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twint/get.py", line 162, in Response async with session.get(url, ssl=False, params=params, proxy=httpproxy) as response: File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 1005, in __aenter__ self._resp = await self._coro File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 497, in _request await resp.start(conn) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 844, in start message, payload = await self._protocol.read() # type: ignore # noqa File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/streams.py", line 588, in read await self._waiter aiohttp.client_exceptions.ServerDisconnectedError: None

Description of Issue

I'm running a script that's downloading a lot including tweets, followers, following, favorites, mentions and so on in a few nested for loops. Every once in a while my internet falters for a bit and it goes off and back on again. When the internet's off, or when the script can't connect to it, it throws this error. I'd like to catch this error, and make it retry that same download again, or more ideally pause, and continue downloading where it left off. Is this possible? How would I go about doing that?

Environment Details

Using MacOS Mojave Version 10.14, Running this in Terminal.

pielco11 commented 4 years ago

You can just specify a config.Resume file so if Twint crashes, it can restart from where it stopped

Otherwise you have to handle the exception with a try/except. In the Except condition, you have to specify the correct error raised and then add an input() statement to wait until the user presses enter (for example)

youssefavx commented 4 years ago

Thank you! I tried the config.Resume on a followers CSV file though and it kept overwriting what seems to be 20 users everytime it downloads.

This was the code I used for the function:

def downloadfollowers(usource):
    print("Downloading followers for " + str(usource))
    x = twint.Config()
    x.Username = str(usource.lower())
    x.Store_object = True
    x.Store_csv = True
    x.Resume = "test/" + str(usource) + " followers.csv"
    #x.Output = str(usource) + " followers.csv"
    x.Output = "test/" + str(usource) + " followers.csv"
    twint.run.Followers(x)

I guess the Resume file has to be separate?

Edit: I just made a separate resume file and it seems to be working great! Thanks again! Hopefully, this takes care of most situations.

pielco11 commented 4 years ago

config.Resume gets overwritten

youssefavx commented 4 years ago

I see, thanks!

I noticed that, when attempting to test it and deliberately disconnecting the internet, it sometimes downloads duplicate tweets or duplicate users. I don't mind the users, but the tweets being duplicated affects some areas of my script.

Is there a way to prevent duplicate tweets?

This is my code:

def downloadtweets(usertweets):
    print("Downloading tweets for: " + str(usertweets))
    dt = twint.Config()
    dt.Username = str(usertweets.lower())
    dt.Store_csv = True
    dt.Resume = "test/" + str(usertweets) + " resume tweets.csv"
    dt.Output = "test/" + str(usertweets) + " tweets.csv"

    while True:
        try:
            twint.run.Search(dt)
            break
        except aiohttp.ClientConnectorError:
            time.sleep(1)
            print('Client error. Restarting...')
pielco11 commented 4 years ago

Now you should not get duplicated tweets while resuming, please retry (you have to either clone the repo or install via pip+git, I might push the update soon)

youssefavx commented 4 years ago

I'll give it a shot and report back!

youssefavx commented 4 years ago

I edited my run.py file to match the edits you made.

Okay, I tried it and it seems to work great for Followers, Following, and Search. However, and I don't know if this is related to that change in 'run.py', now when I try to download favorites via the terminal or via the script, I get this error: CRITICAL:root:twint.feed:Mobile:list index out of range

This is while my internet is connected.

Edit: I tried to undo the changes you made to run.py, and now the favorites run fine (but of course the duplicates still occur in the others), so I assume it's related.

youssefavx commented 4 years ago

I made that edit for url.py along with run.py. Works great now!