wustho / epy

CLI Ebook (epub2, epub3, fb2, mobi) Reader
GNU General Public License v3.0
962 stars 52 forks source link

Hang on viewing epub #57

Open meganleewebb opened 2 years ago

meganleewebb commented 2 years ago

Version: v2022.2.14

epy loads the epub file, but hangs after the first page.

Problem is in (734-754):

    def get_raw_text(self, content_path: Union[str, ET.Element]) -> str:
        assert isinstance(self.file, zipfile.ZipFile)
        assert isinstance(content_path, str)

        max_tries: Optional[int] = None  # 1 if DEBUG else None

        # use try-except block to catch
        # zlib.error: Error -3 while decompressing data: invalid distance too far back
        # seems like caused by multiprocessing
        tries = 0
        while True:
            try:
                content = self.file.open(content_path).read()
                break
            except Exception as e:
                tries += 1
                if max_tries is not None and tries >= max_tries:
                    raise e

        return content.decode("utf-8")

when it is passed a content_path with value: "OEBPS/../jacket.xhtml"

self.file.open(content_path) errors with There is no item named 'OEBPS/../jacket.xhtml' in the archive" The try/except catches the error. max_tries = None The while loop never exits.

My quick fix has been to set max_tries = 2 and add:

content_path=os.path.relpath(content_path) after the tries += 1 line.

wustho commented 2 years ago

Wow, thanks for detailed traceback and solution suggestion, mate... Will look into that.

I just realized it should've been explicit zlib.error exception to catch...

meganleewebb commented 2 years ago

Unzip -t :

testing: mimetype                 OK
    testing: jacket.xhtml             OK
    testing: META-INF/container.xml   OK
    testing: OEBPS/9780062430052_toc.ncx   OK
    testing: OEBPS/9780062430052_content.opf   OK
    testing: OEBPS/images/cover.jpg   OK
    testing: OEBPS/images/auth.jpg    OK
    testing: OEBPS/images/copy.jpg    OK
    testing: OEBPS/images/copy1.jpg   OK
    testing: OEBPS/images/title.jpg   OK
    testing: OEBPS/images/heard.jpg   OK
    testing: OEBPS/imgbackad/imgbackad-1.jpg   OK
    testing: OEBPS/imgbackad/imgbackad-2.jpg   OK
    testing: OEBPS/styles/9780062430052_template.css   OK
    testing: OEBPS/text/titlepage.xhtml   OK
    testing: OEBPS/text/nav.xhtml     OK
    testing: OEBPS/text/About_the_Author.xhtml   OK
    testing: OEBPS/text/About_the_Publisher.xhtml   OK
    testing: OEBPS/text/9780062430052_Also_by.xhtml   OK
    testing: OEBPS/text/9780062430052_Credits.xhtml   OK
...

jacket.xhtml does exist in the epub. First file accessed was titlepage.xhtml. jacket.xhtml was 2nd.

I didn't look further if the file path "OEBPS/../jacket.xhtml" should have been resolved prior to this function.

Thanks for great app. 👍

wustho commented 2 years ago

Hey, there just fixed this with: https://github.com/wustho/epy/commit/50dd4faf4ca095b8f10a1883ca1168a2628e877e

Can you try upgrading epy (pip install --upgrade epy-reader) and let me know if the issue still persists. Thanks.

meganleewebb commented 2 years ago

That works.

I think your leaving a bug by not putting an upper limit on max_tries to exit the while True loop.

max_tries: Optional[int] = None  # 1 if DEBUG else None

        # use try-except block to catch
        # zlib.error: Error -3 while decompressing data: invalid distance too far back
        # seems like caused by multiprocessing
        tries = 0
        while True:
            try:
                content = self.file.open(content_path).read()
                break
            except zlib.error as e:
                tries += 1
                if max_tries is not None and tries >= max_tries:
                    raise e

That will loop indefinitely on any zip error.

The comment:

# use try-except block to catch
# zlib.error: Error -3 while decompressing data: invalid distance too far back
# seems like caused by multiprocessing

Suggests the loop is there as more than one attempt is needed for that error. But some upper limit would seem wise.

I've not seen that error, but I have seen zip files that have one member corrupted and that member is unreadable, while the rest are.

wustho commented 2 years ago

Oh, that's indeed wise idea! Putting it in TODO, thanks!