An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.81k
stars
2.73k
forks
source link
Collecting tweets by user since a certain date #865
I am collecting tweets made by a set of users since 2020-07-01. I run the script every day to update the metrics on the tweets already made and capture any new tweets made in the past one day
Initial Check
[Yes] Python version is 3.6;
[Yes] Updated Twint with pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
[Yes] I have searched the issues and there are no duplicates of this issue/question/request.
Command Ran
I am running this as a python script.
SINCE = 2020-07-01 00:00:00
c = twint.Config()
for twitter_id in tqdm(input_df.twitter_id):
output_file = str(dir_to_write_to / (twitter_id + ".csv"))
c.Username = twitter_id
c.Since = SINCE
c.Store_object = True
c.Hide_output = True
c.Store_csv = True
c.Output = output_file
twint.run.Search(c)
input_df is a simple data frame with a list of twitter_id's that I am using to loop through.
Description of Issue
The script was running fine till yesterday, and today the script crashes after collecting tweets from 3-4 user ids.
concurrent.futures._base.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "makeCSV__Base.py", line 22, in <module>
twint.run.Search(c)
File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 201, in Search
run(config)
File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 153, in run
get_event_loop().run_until_complete(Twint(config).main())
File "/usr/lib/python3.6/asyncio/base_events.py", line 473, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 140, in main
await self.tweets()
File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 97, in tweets
await self.Feed()
File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 40, in Feed
response = await get.RequestUrl(self.config, self.init, headers=[("User-Agent", self.user_agent)])
File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 58, in RequestUrl
response = await Request(_url, params=params, connector=_connector, headers=headers)
File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 89, in Request
return await Response(session, url, params)
File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 95, in Response
return await response.text()
File "/usr/local/lib/python3.6/dist-packages/async_timeout/__init__.py", line 45, in __exit__
self._do_exit(exc_type)
File "/usr/local/lib/python3.6/dist-packages/async_timeout/__init__.py", line 92, in _do_exit
raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError
Collecting tweets by user since a certain date
I am collecting tweets made by a set of users since 2020-07-01. I run the script every day to update the metrics on the tweets already made and capture any new tweets made in the past one day
Initial Check
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
;Command Ran
I am running this as a python script.
input_df
is a simple data frame with a list oftwitter_id
's that I am using to loop through.Description of Issue
The script was running fine till yesterday, and today the script crashes after collecting tweets from 3-4 user ids.
Environment Details
Ubuntu 18.04, Python 3.6, Anaconda