openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
127 stars 37 forks source link

crash scenario #60

Closed kelson42 closed 6 years ago

kelson42 commented 6 years ago
[epub] Requesting URLs for #27578# Two Years with the Natives in the Western Pacific
http://aleph.gutenberg.org:80 "GET /cache/epub/17534/pg17534.epub HTTP/1.1" 200 35844
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/17534/pg17534.epub --output dl-cache/17534.epub
Starting new HTTP connection (1): aleph.gutenberg.org
   Downloading content files for Book #18415
[epub] Requesting URLs for #18415# Histoires incroyables, Tome I
[pdf] not avail. for #21229# Thistle and Rose: A Story for Girls
[pdf] not avail. for #26705# The Caravan Route between Egypt and Syria
[html] Requesting URLs for #26705# The Caravan Route between Egypt and Syria
[html] Requesting URLs for #21229# Thistle and Rose: A Story for Girls
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
http://aleph.gutenberg.org:80 "GET /2/4/8/7/24873/24873-h.zip HTTP/1.1" 200 148084
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/2/4/8/7/24873/24873-h.zip --output dl-cache/24873.html.zip
http://aleph.gutenberg.org:80 "GET /cache/epub/27578/pg27578.epub HTTP/1.1" 200 208899
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/27578/pg27578.epub --output dl-cache/27578.epub
   Downloading content files for Book #23098
[pdf] not avail. for #28454# Heart of the Blue Ridge
[html] Requesting URLs for #28454# Heart of the Blue Ridge
   Downloading content files for Book #25749
[epub] Requesting URLs for #23098# La Femme Abbé
   Downloading content files for Book #16662
[epub] Requesting URLs for #25749# A Tall Ship
[epub] Requesting URLs for #16662# Bad Hugh
[pdf] not avail. for #15786# Himlauret eller det profetiska ordet
[pdf] not avail. for #19261# Bronchoscopy and Esophagoscopy
http://aleph.gutenberg.org:80 "GET /cache/epub/18415/pg18415.epub HTTP/1.1" 200 130411
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/18415/pg18415.epub --output dl-cache/18415.epub
   Downloading content files for Book #14928
[pdf] not avail. for #17534# Os Simples
http://aleph.gutenberg.org:80 "GET /2/1/2/2/21229/21229-h.zip HTTP/1.1" 200 253154
http://aleph.gutenberg.org:80 "GET /2/6/7/0/26705/26705-h.zip HTTP/1.1" 200 842398
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/2/1/2/2/21229/21229-h.zip --output dl-cache/21229.html.zip
[pdf] not avail. for #22113# Peggy Stewart at School
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/2/6/7/0/26705/26705-h.zip --output dl-cache/26705.html.zip
[epub] Requesting URLs for #14928# Punch, or the London Charivari, Volume 1, September 18, 1841
[html] Requesting URLs for #19261# Bronchoscopy and Esophagoscopy
Starting new HTTP connection (1): aleph.gutenberg.org
[html] Requesting URLs for #15786# Himlauret eller det profetiska ordet
[html] Requesting URLs for #22113# Peggy Stewart at School
Starting new HTTP connection (1): aleph.gutenberg.org
[html] Requesting URLs for #17534# Os Simples
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
Starting new HTTP connection (1): aleph.gutenberg.org
http://aleph.gutenberg.org:80 "GET /2/8/4/5/28454/28454-h.zip HTTP/1.1" 200 1019693
Starting new HTTP connection (1): aleph.gutenberg.org
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/2/8/4/5/28454/28454-h.zip --output dl-cache/28454.html.zip
http://aleph.gutenberg.org:80 "GET /cache/epub/23098/pg23098.epub HTTP/1.1" 200 64323
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/23098/pg23098.epub --output dl-cache/23098.epub
http://aleph.gutenberg.org:80 "GET /cache/epub/25749/pg25749.epub HTTP/1.1" 200 113790
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/25749/pg25749.epub --output dl-cache/25749.epub
http://aleph.gutenberg.org:80 "GET /cache/epub/16662/pg16662.epub HTTP/1.1" 200 315336
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/16662/pg16662.epub --output dl-cache/16662.epub
   Downloading content files for Book #29381
http://aleph.gutenberg.org:80 "GET /2/2/1/1/22113/22113-h.zip HTTP/1.1" 200 389843
http://aleph.gutenberg.org:80 "GET /cache/epub/15786/pg15786.html.utf8 HTTP/1.1" 200 472386
http://aleph.gutenberg.org:80 "GET /cache/epub/19261/pg19261.html.utf8 HTTP/1.1" 200 546638
http://aleph.gutenberg.org:80 "GET /cache/epub/17534/pg17534.html.utf8 HTTP/1.1" 200 92424
http://aleph.gutenberg.org:80 "GET /cache/epub/14928/pg14928.epub HTTP/1.1" 200 61074
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/15786/pg15786.html.utf8 --output dl-cache/15786.html
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/19261/pg19261.html.utf8 --output dl-cache/19261.html
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/17534/pg17534.html.utf8 --output dl-cache/17534.html
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/cache/epub/14928/pg14928.epub --output dl-cache/14928.epub
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/2/2/1/1/22113/22113-h.zip --output dl-cache/22113.html.zip
[pdf] not avail. for #27578# Two Years with the Natives in the Western Pacific
[html] Requesting URLs for #27578# Two Years with the Natives in the Western Pacific
Starting new HTTP connection (1): aleph.gutenberg.org
Traceback (most recent call last):
  File "./gutenberg2zim", line 214, in <module>
        Downloading content files for Book #20246
main(docopt(help, version=0.1))
  File "./gutenberg2zim", line 167, in main
    force=FORCE)
  File "/home/kelson/tmp/gutenberg/gutenbergtozim/download.py", line 226, in download_all_books
    Pool(concurrency).map(dlb, available_books)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    http://aleph.gutenberg.org:80 "GET /2/7/5/7/27578/27578-h.zip HTTP/1.1" 200 3295950
return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 3: ordinal not in range(128)
[pdf] not avail. for #23098# La Femme Abbé
curl --fail --insecure --location --silent --show-error -C - --url http://aleph.gutenberg.org/2/7/5/7/27578/27578-h.zip --output dl-cache/27578.html.zip
[pdf] not avail. for #18415# Histoires incroyables, Tome I
[pdf] not avail. for #16662# Bad Hugh
   Downloading content files for Book #17535
[pdf] not avail. for #25749# A Tall Ship
[epub] Requesting URLs for #20246# �uvres complètes de Alfred de Musset - Tome 3
[pdf] not avail. for #14928# Punch, or the London Charivari, Volume 1, September 18, 1841
[epub] Requesting URLs for #29381# The Works Of Charles James Lever
[epub] Requesting URLs for #17535# The Jester of St. Timothy's
[html] Requesting URLs for #23098# La Femme Abbé
[html] Requesting URLs for #18415# Histoires incroyables, Tome I
[html] Requesting URLs for #16662# Bad Hugh
   Downloading content files for Book #15787
[html] Requesting URLs for #25749# A Tall Ship
[html] Requesting URLs for #14928# Punch, or the London Charivari, Volume 1, September 18, 1841
[epub] Requesting URLs for #15787# Sieben Jahre in Süd-Afrika. Erster Band.
   Downloading content files for Book #19262
Exception in thread Thread-22 (most likely raised during interpreter shutdown):
Erreur de segmentation