Closed kelson42 closed 7 years ago
I can not reproduce this bug. I close the ticket.
Reopen the bug, to reproduce simply remove the files: rm dl-cache/24010.*
before starting download: ./dump-gutenberg.py --keep-db --download --books=24010
One of the consequence of this seems to be that their is no HTML version at all for this book.
"GET /etext93/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext05/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext01/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext00/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext02/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext04/24006-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext95/24006-h.zip HTTP/1.1" 404 217 Downloading content files for Book #24010 [epub] Requesting URLs for #24010# The Gods are Athirst Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /cache/generated/24010/pg24010.epub HTTP/1.1" 200 207009 [pdf] not avail. for #24010# The Gods are Athirst [html] Requesting URLs for #24010# The Gods are Athirst Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext92/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext90/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext96/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext94/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /2/4/0/1/24010/24010-h.html HTTP/1.1" 404 224 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext98/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /2/4/0/1/24010/24010-h.htm HTTP/1.1" 404 223 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /cache/generated/24010/pg24010.html.utf8 HTTP/1.1" 404 237 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext00/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext93/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /etext91/24010-h.zip HTTP/1.1" 404 217 Starting new HTTP connection (1): gutenberg.readingroo.ms "GET /2/4/0/1/24010/24010-h.zip HTTP/1.1" 200 506925 Traceback (most recent call last): File "./dump-gutenberg.py", line 150, in
main(docopt(help, version=0.1))
File "./dump-gutenberg.py", line 129, in main
only_books=BOOKS)
File "/media/data/gutenberg/gutenberg/download.py", line 200, in download_all_books
download_cache=download_cache)
File "/media/data/gutenberg/gutenberg/download.py", line 46, in handle_zipped_epub
if not is_safe(n)]):
File "/media/data/gutenberg/gutenberg/download.py", line 34, in is_safe
if path(fname).basename() == fname:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 3: ordinal not in range(128)