openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
127 stars 37 forks source link

Creation process dies too often because of connectivity problems #48

Closed kelson42 closed 7 years ago

kelson42 commented 7 years ago

For example:

[html] Requesting URLs for #6511# The suppressed Gospels and Epistles of the original New Testament of Jesus the Christ, Volume 5, St. Paul
Starting new HTTP connection (1): gutenberg.readingroo.ms
http://gutenberg.readingroo.ms:80 "GET /etext03/9962-h.zip HTTP/1.1" 404 216
Starting new HTTP connection (1): gutenberg.readingroo.ms
        Downloading content files for Book #12415
Traceback (most recent call last):
  File "/usr/local/bin/gutenberg2zim", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/src/gutenberg2zim/gutenberg2zim", line 192, in <module>
    main(docopt(help, version=0.1))
  File "/src/gutenberg2zim/gutenberg2zim", line 160, in main
    force=FORCE)
  File "/src/gutenberg2zim/gutenbergtozim/download.py", line 225, in download_all_books
    Pool(concurrency).map(dlb, available_books)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
requests.exceptions.ConnectionError: HTTPConnectionPool(host='gutenberg.readingroo.ms', port=80): Max retries exceeded with url: /etext05/741.html.images (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fcdd9e5cb90>: Failed to establish a new connection: [Errno 110] Connection timed out',))
[epub] Requesting URLs for #12415# Byways Around San Francisco Bay
http://gutenberg.readingroo.ms:80 "GET /etext05/5805.html.noimages HTTP/1.1" 404 224
http://gutenberg.readingroo.ms:80 "GET /etext94/4343-h.htm HTTP/1.1" 404 216
Starting new HTTP connection (1): gutenberg.readingroo.ms
Starting new HTTP connection (1): gutenberg.readingroo.ms
Starting new HTTP connection (1): gutenberg.readingroo.ms