openzim / ted

Provide the best of TED.com for offline usage!
https://download.kiwix.org/zim/ted/
GNU General Public License v3.0
13 stars 9 forks source link

TED scraper is broken #12

Closed kelson42 closed 4 years ago

kelson42 commented 5 years ago

I have tried to run it two times but it failed two times... that said I don't think this was at the same place... but the error looks similar.

https://pe.tedcdn.com/images/ted/354459a70ea454421a5cb3ce7e2e931959b0fcfb_254x191.jpg
Downloading video thumbnail... "my mama" / "BLACK BANANA"
Downloading video... Why must artists be poor?
Downloading speaker image... Why must artists be poor?
https://pe.tedcdn.com/images/ted/ba9f87b9d08d0d7e7b82dc261e40e205901e7378_254x191.jpg
Downloading video thumbnail... Why must artists be poor?
Downloading video... The Great Migration and the power of a single decision
Downloading speaker image... The Great Migration and the power of a single decision
https://pe.tedcdn.com/images/ted/8a86ba4fd5f5eccf39d2b4f83382fffdf0e76532_254x191.jpg
Downloading video thumbnail... The Great Migration and the power of a single decision
Downloading video... How the hyperlink changed everything
Downloading speaker image... How the hyperlink changed everything
https://pe.tedcdn.com/images/ted/3bf88078727d4a9cc66282b0c0795f04acb537b9_254x191.jpg
Downloading video thumbnail... How the hyperlink changed everything
Downloading video... The hidden ways stairs shape your life
Downloading speaker image... The hidden ways stairs shape your life
https://pe.tedcdn.com/images/ted/581324210212221474b1646bed6dda79a3086fde_254x191.jpg
Downloading video thumbnail... The hidden ways stairs shape your life
Downloading video... How the button changed fashion
Downloading speaker image... How the button changed fashion
https://pe.tedcdn.com/images/ted/de5087457c53135fa6d0beb4e84b4b29a6971e71_254x191.jpg
Downloading video thumbnail... How the button changed fashion
Downloading video... The 3,000-year history of the hoodie
Downloading speaker image... The 3,000-year history of the hoodie
https://pe.tedcdn.com/images/ted/3e7a31f9ffbd78e89be7e1f967c6a5e1a60c3358_254x191.jpg
Downloading video thumbnail... The 3,000-year history of the hoodie
Downloading video... How the jump rope got its rhythm
Speaker has not image
Downloading video thumbnail... How the jump rope got its rhythm
Downloading video... How the progress bar keeps you sane
Downloading speaker image... How the progress bar keeps you sane
https://pe.tedcdn.com/images/ted/ed27c427566bc94e645aa3360c25786eee66473d_254x191.jpg
Downloading video thumbnail... How the progress bar keeps you sane
Traceback (most recent call last):
  File "/usr/local/bin/ted2zim", line 91, in <module>
    App()
  File "/usr/local/bin/ted2zim", line 18, in __init__
    self.run()
  File "/usr/local/bin/ted2zim", line 55, in run
    scraper.download_video_data()
  File "/usr/local/lib/python2.7/dist-packages/scraper/webscraper.py", line 508, in download_video_data
    r = utils.download_from_site(video_thumbnail)
  File "/usr/local/lib/python2.7/dist-packages/scraper/utils.py", line 52, in download_from_site
    r = requests.get(url, headers = headers)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 520, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 630, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='pi.tedcdn.com', port=443): Max retries exceeded with url: /r/talkstar-photos.s3.amazonaws.com/uploads/6bd8639d-1e68-4f73-8d63-f8d418b4121f/DanielEngber_2018V-embed.jpg?quality=89&w=600 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f325ae3f450>: Failed to establish a new connection: [Errno -2] Name or service not known',))
satyamtg commented 4 years ago

@kelson42 this seems like a connection error. However, we now use save_large_file from zimscraperlib which in turn uses wget . This theoretically should not hold now. However, I'll try to check once with large amount of downloads and then do whatever is required.