openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
130 stars 37 forks source link

Fails if there's no title tag in HTML #130

Closed rgaudin closed 4 years ago

rgaudin commented 4 years ago

Latest run in the zimfarm failed on a book for which the HTML has no <title /> tag as it can't replace it.

Traceback (most recent call last):
  File "/usr/local/bin/gutenberg2zim", line 4, in <module>
    __import__('pkg_resources').run_script('gutenberg2zim==1.1.4', 'gutenberg2zim')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1438, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/EGG-INFO/scripts/gutenberg2zim", line 274, in <module>
    main(docopt(help, version=VERSION))
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/EGG-INFO/scripts/gutenberg2zim", line 235, in main
    optimizer_version=OPTIMIZER_VERSION,
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/gutenbergtozim/export.py", line 281, in export_all_books
    Pool(concurrency).map(dlb, books)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/gutenbergtozim/export.py", line 278, in dlb
    optimizer_version=optimizer_version,
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/gutenbergtozim/export.py", line 573, in export_book
    optimizer_version=optimizer_version,
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/gutenbergtozim/export.py", line 631, in handle_unoptimized_files
    new_html = update_html_for_static(book=book, html_content=html)
  File "/usr/local/lib/python3.6/dist-packages/gutenberg2zim-1.1.4-py3.6.egg/gutenbergtozim/export.py", line 363, in update_html_for_static
    soup.title.string = book.title
AttributeError: 'NoneType' object has no attribute 'string'