wustho / epy

CLI Ebook (epub2, epub3, fb2, mobi) Reader
GNU General Public License v3.0
981 stars 53 forks source link

Strange crash in zipfile #87

Closed mcepl closed 1 year ago

mcepl commented 1 year ago

When trying to open the attached file with the latest checkout of epy, I get this crash when scrolling from the front page. I can easily open it with foliate and read it, so I don't think it is a problem with the file itself.

Using otherwise rather standard openSUSE/Factory with Python 3.10.10. Do you have any idea what’s going on, please?

stitny~/b/E/epy ()$ epy ~/Knihy/fanfiction/marauders/Life\ as\ Petunia\ Evans-ao3_41805360
.epub
Traceback (most recent call last):
  File "/usr/bin/epy", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.10/site-packages/epy_reader/__main__.py", line 18, in main
    filepath = curses.wrapper(reader.start_reading, filepath)
  File "/usr/lib64/python3.10/curses/__init__.py", line 94, in wrapper
    return func(stdscr, *args, **kwds)
  File "/usr/lib/python3.10/site-packages/epy_reader/reader.py", line 1599, in start_reading
    reading_state_or_ebook = reader.read(reading_state)
  File "/usr/lib/python3.10/site-packages/epy_reader/reader.py", line 847, in read
    text_structure, toc_entries, contents = self.get_current_book_content(reading_state)
  File "/usr/lib/python3.10/site-packages/epy_reader/reader.py", line 797, in get_current_book_content
    content = self.ebook.get_raw_text(content_path)
  File "/usr/lib/python3.10/site-packages/epy_reader/ebooks/epub.py", line 188, in get_raw_text
    content = self.file.open(content_path).read()
  File "/usr/lib64/python3.10/zipfile.py", line 911, in read
    buf += self._read1(self.MAX_N)
  File "/usr/lib64/python3.10/zipfile.py", line 993, in _read1
    data += self._read2(n - len(data))
  File "/usr/lib64/python3.10/zipfile.py", line 1028, in _read2
    raise EOFError
EOFError
stitny~/b/E/epy ()$ 

Life as Petunia Evans-ao3_41805360.zip (renamed from .epub to .zip because of GitHub.

wustho commented 1 year ago

That's pretty strange indeed, tho seems like file issue (like correpted or something) as you guess, instead of epy issue, but will have a look. Also thanks for the help answering other issues @mcepl

mcepl commented 1 year ago

I think the background is https://github.com/python/cpython/issues/101911 … do you have any idea what could be the problem? It is weird, because code where it crashes is rather old. What has changed recently?

mcepl commented 1 year ago

https://github.com/python/cpython/issues/101911#issuecomment-1442519220 suggests that the problem is with multiple threads (processes in our case) reading from one zipfile at once. And truly when I modify this line

https://github.com/wustho/epy/blob/c7a87f3901c4c8e2c19f009aaf4e71e7e8e78b3e/src/epy_reader/reader.py#L138

just to set self._multiprocess_support to False, all problems go away.

mcepl commented 1 year ago

Hmm, bisecting leads me to 3d6cc576607dd523195e5e8444612079645bb274, which seems strange. How could reverting lead to something like this?

daroot commented 1 year ago

I can confirm this sort of crash and that it's inconsistent; some days I can view any particular epub, and some days I get the EOFError. That definitely smells like a multiprocessing/locking issue to me. It's probably highly sensitive to the timings in the environment as to if it tickles the python/cpython#101911 bug.

The fix mcepl used works 100% for me. Force disable multiprocessing and I never have any trouble.

The referenced bug seems to indicate zipfile is not threadsafe, and epy doesn't seem to be using any locks, and the code around epub.py:188 has some exception handling and retries around other zlib related errors with multiprocessing. Not sure if adding EOFError to the exceptions caught and retried is sufficient, but it might be worth a try.

wustho commented 1 year ago

Hmm, what do you think of making a copy of the object for multiprocessing guys?

In fact, I'll try it right now.

wustho commented 1 year ago

Hey guys, try fixing this with https://github.com/wustho/epy/commit/ee3d6932be21b8bb69881d625b201758ee125eb3 Let me know if the issue still persist... Thanks!

mcepl commented 1 year ago

Yes, it seems to be working fine with 6b0e9fe. However, not 100% certain as I was not able to 100% reproduce before. Thanks anyway for the effort!

wustho commented 1 year ago

No rush at all, thanks mate

mcepl commented 1 year ago

Yes, I haven’t seen the crash any more. And yes, this is probably the worst commit message I have seen in a long time ;).