openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
130 stars 37 forks source link

Crash with invalid epub #30

Closed kelson42 closed 9 years ago

kelson42 commented 9 years ago
            Copying companion file to 28969_wilhelm_tell_schiller_friedrich.epub
            Creating ePUB at /media/data/gutenberg/tmp/tmpFYw00B.epub
            Copying companion file to 28969_pg6780-images.mobi
            shitty ext: /media/data/gutenberg/static/28969_pg6780-images.mobi
            Copying /media/data/gutenberg/static/28969_pg6780-images.mobi
            Exporting HTML file to /media/data/gutenberg/static/28969_6787-h.htm
            Copying companion file to 28969_love_and_intrigue_schiller_friedrich.epub
            Creating ePUB at /media/data/gutenberg/tmp/tmpZq0rSI.epub
            Copying companion file to 28969_the_thirty_years_war_complete_schiller_friedrich.epub
            Creating ePUB at /media/data/gutenberg/tmp/tmp9p2Zzv.epub
            Copying companion file to 28969_3pb216.jpg
            Copying /media/data/gutenberg/static/28969_3pb216.jpg
            Copying companion file to 28969_the_piccolomini_schiller_friedrich.epub
            Creating ePUB at /media/data/gutenberg/tmp/tmphjKR5d.epub
            Copying companion file to 28969_pg6790-images.epub
            Creating ePUB at /media/data/gutenberg/tmp/tmpCyYXya.epub

Traceback (most recent call last): File "./dump-gutenberg.py", line 154, in main(docopt(help, version=0.1)) File "./dump-gutenberg.py", line 141, in main only_books=BOOKS) File "/media/data/gutenberg/gutenberg/export.py", line 155, in export_all_books books=books) File "/media/data/gutenberg/gutenberg/export.py", line 548, in export_book_to handle_companion_file(fname) File "/media/data/gutenberg/gutenberg/export.py", line 524, in handle_companion_file optimize_epub(src, tmp_epub.name) File "/media/data/gutenberg/gutenberg/export.py", line 440, in optimize_epub with zipfile.ZipFile(src, 'r') as zf: File "/usr/lib/python2.7/zipfile.py", line 714, in init self._GetContents() File "/usr/lib/python2.7/zipfile.py", line 748, in _GetContents self._RealGetContents() File "/usr/lib/python2.7/zipfile.py", line 763, in _RealGetContents raise BadZipfile, "File is not a zip file" zipfile.BadZipfile: File is not a zip file

rgaudin commented 9 years ago

can you copy the rest of the command output (from the begining of that book number: 28969) ? I can't reproduce… I also noticed that this zip file is particularly big (33M). Might be corrupted as well.

(gut2)reg@homelet ~/src/gutenberg (master) $ ./dump-gutenberg.py -b 28969 --export
EXPORTING ebooks to static folder (and JSON)
[28969]
    Filtered book collection size: 1
    Filtered book collection, PDF: 0
    Filtered book collection, ePUB: 1
    Filtered book collection, HTML: 1
        Dumping full_by_popularity.js
        Dumping full_by_title.js
        Dumping lang_en_by_popularity.js
        Dumping lang_en_by_title.js
        Dumping authors_lang_en.js
        Dumping auth_289_by_popularity.js
        Dumping auth_289_by_title.js
        Dumping authors.js
        Dumping languages.js
        Dumping main_languages.js
    Exporting Book #28969.
Missing HTML content for #28969 at dl-cache/28969.html
        Copying format file to The Illustrated Works Of Friedrich Schiller.28969.epub
        Creating ePUB at /Users/reg/src/gutenberg/tmp/tmpg26IN2.epub
        Exporting to static/The Illustrated Works Of Friedrich Schiller_cover.28969.html
kelson42 commented 9 years ago

comand is exactly the same... I guess for some reason the epub is corrupted?

rgaudin commented 9 years ago

copy the full output of the command then

On Thu, Oct 16, 2014 at 9:46 AM, Kelson notifications@github.com wrote:

comand is exactly the same... I guess for some reason the epub is corrupted?

— Reply to this email directly or view it on GitHub https://github.com/kiwix/gutenberg/issues/30#issuecomment-59338130.

kelson42 commented 9 years ago

Here is the log http://zimfarm.kiwix.org/log

kelson42 commented 9 years ago

$ ls -la dl-cache/28969_pg6790-images.epub -rw-rw-r-- 1 kelson kelson 0 Sep 20 07:15 dl-cache/28969_pg6790-images.epub

... 0 byte... something is definitly wrong with this file