twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.82k stars 2.73k forks source link

Append to end of a csv already in progress? #367

Closed NewCTwo closed 5 years ago

NewCTwo commented 5 years ago

Initial Check

If the issue is a request please specify that it is a request in the title (Example: [REQUEST] more features). If this is a question regarding 'twint' please specify that it's a question in the title (Example: [QUESTION] What is x?). Please only submit issues related to 'twint'. Thanks.

Make sure you've checked the following:

Description of Issue

After about 10 minutes of scraping, twint crashes. How can set it to pick up where it left off when I restart the script?

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 94, in Response
    async with session.get(url, ssl=False, params=params) as response:
  File "/usr/local/lib/python3.6/dist-packages/aiohttp/client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.6/dist-packages/aiohttp/client.py", line 497, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.6/dist-packages/aiohttp/client_reqrep.py", line 844, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/usr/local/lib/python3.6/dist-packages/aiohttp/streams.py", line 588, in read
    await self._waiter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "makeCSV__Base.py", line 22, in <module>
    twint.run.Search(c)
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 201, in Search
    run(config)
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 153, in run
    get_event_loop().run_until_complete(Twint(config).main())
  File "/usr/lib/python3.6/asyncio/base_events.py", line 473, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 140, in main
    await self.tweets()
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 97, in tweets
    await self.Feed()
  File "/usr/local/lib/python3.6/dist-packages/twint/run.py", line 40, in Feed
    response = await get.RequestUrl(self.config, self.init, headers=[("User-Agent", self.user_agent)])
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 58, in RequestUrl
    response = await Request(_url, params=params, connector=_connector, headers=headers)
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 89, in Request
    return await Response(session, url, params)
  File "/usr/local/lib/python3.6/dist-packages/twint/get.py", line 95, in Response
    return await response.text()
  File "/usr/local/lib/python3.6/dist-packages/async_timeout/__init__.py", line 45, in __exit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.6/dist-packages/async_timeout/__init__.py", line 92, in _do_exit
    raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError

Environment Details

Pythonn 3.6 on Ubuntu 18.04

pielco11 commented 5 years ago

You have to specify the --resume/config.Resume option passing the id of the last tweet

Might happen that Twint finds "hidden tweets" and so breaks. In this case you have to specify a date-range (with since and until) and move from window to window

You could even just search via Twitter Advanced Search and pick ids from there

Twint automatically detects if a file already exists and so, in this case, the data will be appended and nothing overwritten

NewCTwo commented 5 years ago

Yeah, config.Resume = worked.

pielco11 commented 5 years ago

Great to see it working!