openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
128 stars 37 forks source link

Crash during download #19

Closed kelson42 closed 7 years ago

kelson42 commented 9 years ago

"GET /etext93/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext05/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext01/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext00/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext02/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext04/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext95/24006-h.zip HTTP/1.1" 404 217 Downloading content files for Book #24010 [epub] Requesting URLs for #24010# The Gods are Athirst Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /cache/generated/24010/pg24010.epub HTTP/1.1" 200 207009 [pdf] not avail. for #24010# The Gods are Athirst [html] Requesting URLs for #24010# The Gods are Athirst Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext92/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext90/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext96/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext94/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /2/4/0/1/24010/24010-h.html HTTP/1.1" 404 224 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext98/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /2/4/0/1/24010/24010-h.htm HTTP/1.1" 404 223 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /cache/generated/24010/pg24010.html.utf8 HTTP/1.1" 404 237 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext00/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext93/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext91/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /2/4/0/1/24010/24010-h.zip HTTP/1.1" 200 506925 Traceback (most recent call last): File "./dump-gutenberg.py", line 150, in main(docopt(help, version=0.1)) File "./dump-gutenberg.py", line 129, in main only_books=BOOKS) File "/media/data/gutenberg/gutenberg/download.py", line 200, in download_all_books download_cache=download_cache) File "/media/data/gutenberg/gutenberg/download.py", line 46, in handle_zipped_epub if not is_safe(n)]): File "/media/data/gutenberg/gutenberg/download.py", line 34, in is_safe if path(fname).basename() == fname: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 3: ordinal not in range(128)

kelson42 commented 9 years ago

I can not reproduce this bug. I close the ticket.

kelson42 commented 9 years ago

Reopen the bug, to reproduce simply remove the files: rm dl-cache/24010.*

before starting download: ./dump-gutenberg.py --keep-db --download --books=24010

kelson42 commented 9 years ago

One of the consequence of this seems to be that their is no HTML version at all for this book.