openzim / devdocs

devdocs.io to ZIM scraper
GNU General Public License v3.0
2 stars 0 forks source link

Wrong exception is raised at keyboard interrupt and application never finishes #14

Open benoit74 opened 2 months ago

benoit74 commented 2 months ago

When a keyboard interrupt is sent while the libzim creator has already been created and is procesing entries, wrong exception is raised and the scraper never finishes and stays stuck.

^C[devdocs2zim::2024-09-10 07:12:10,459] ERROR:Traceback (most recent call last):
  File "libzim/libzim.pyx", line 98, in libzim.string_cy_call_fct
  File "libzim/libzim.pyx", line 84, in libzim.call_method
AttributeError: 'StaticItem' object has no attribute 'get_mimetype'
Traceback (most recent call last):
  File "/home/benoit/Repos/openzim/devdocs/src/devdocs2zim/entrypoint.py", line 90, in main
    ).run()
      ^^^^^
  File "/home/benoit/Repos/openzim/devdocs/src/devdocs2zim/generator.py", line 366, in run
    self.generate_zim(
  File "/home/benoit/Repos/openzim/devdocs/src/devdocs2zim/generator.py", line 423, in generate_zim
    self.add_zim_contents(
  File "/home/benoit/Repos/openzim/devdocs/src/devdocs2zim/generator.py", line 499, in add_zim_contents
    creator.add_item_for(  # type: ignore
  File "/home/benoit/Repos/openzim/devdocs/.hatch/devdocs2zim/lib/python3.12/site-packages/zimscraperlib/zim/creator.py", line 367, in add_item_for
    self.add_item(
  File "/home/benoit/Repos/openzim/devdocs/.hatch/devdocs2zim/lib/python3.12/site-packages/zimscraperlib/zim/creator.py", line 404, in add_item
    raise exc
  File "/home/benoit/Repos/openzim/devdocs/.hatch/devdocs2zim/lib/python3.12/site-packages/zimscraperlib/zim/creator.py", line 401, in add_item
    super().add_item(item)
  File "libzim/libzim.pyx", line 358, in libzim._Creator.add_item
RuntimeError: Traceback (most recent call last):
  File "libzim/libzim.pyx", line 98, in libzim.string_cy_call_fct
  File "libzim/libzim.pyx", line 84, in libzim.call_method
AttributeError: 'StaticItem' object has no attribute 'get_mimetype'

I strongly suspect this is linked to both wrong exception handling in pylibzim and/or a problem freeing C++ resources.

I'll open an issue in pylibzim, because I strongly suspect this is an upstream issue.

benoit74 commented 2 months ago

This is confirmed to probably be a python-libzim issue, and it looks like it will not make it for 0.1.0, not easy to analyze. This will probably slip to next release.