twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.81k stars 2.73k forks source link

Collecting tweets by user since a certain date #865

Closed pgurazada closed 4 years ago

pgurazada commented 4 years ago

Collecting tweets by user since a certain date

I am collecting tweets made by a set of users since 2020-07-01. I run the script every day to update the metrics on the tweets already made and capture any new tweets made in the past one day

Initial Check

Command Ran

I am running this as a python script.

SINCE = 2020-07-01 00:00:00

c = twint.Config()

for twitter_id in tqdm(input_df.twitter_id):

    output_file = str(dir_to_write_to / (twitter_id + ".csv"))

    c.Username = twitter_id
    c.Since = SINCE
    c.Store_object = True
    c.Hide_output = True
    c.Store_csv = True
    c.Output = output_file

    twint.run.Search(c)

input_df is a simple data frame with a list of twitter_id's that I am using to loop through.

Description of Issue

The script was running fine till yesterday, and today the script crashes after collecting tweets from 3-4 user ids.

concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "makeCSV__Base.py", line 22, in <module>
    twint.run.Search(c)
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 201, in Search
    run(config)
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 153, in run
    get_event_loop().run_until_complete(Twint(config).main())
  File "/usr/lib/python3.6/asyncio/base_events.py", line 473, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 140, in main
    await self.tweets()
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 97, in tweets
    await self.Feed()
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 40, in Feed
    response = await get.RequestUrl(self.config, self.init, headers=[("User-Agent", self.user_agent)])
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 58, in RequestUrl
    response = await Request(_url, params=params, connector=_connector, headers=headers)
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 89, in Request
    return await Response(session, url, params)
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 95, in Response
    return await response.text()
  File "/usr/local/lib/python3.6/dist-packages/async_timeout/__init__.py", line 45, in __exit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.6/dist-packages/async_timeout/__init__.py", line 92, in _do_exit
    raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError

Environment Details

Ubuntu 18.04, Python 3.6, Anaconda

pgurazada commented 4 years ago

The issue mysteriously disappears. I am not sure what happened.