pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.18k stars 496 forks source link

set_toc method error #3488

Closed YYTB closed 4 months ago

YYTB commented 4 months ago

Description of the bug

in 1.24.3, when i call somedoc.set_toc([(1, "sometitle", 1), ]), it raises a lot of keyerror exceptions,1.24.2 have no problem in same code。

How to reproduce the bug

i have no idea

PyMuPDF version

1.24.3

Operating system

Windows

Python version

3.8

JorjMcKie commented 4 months ago

You must provide an example where this happens. Otherwise we cannot accept this bug report.

YYTB commented 4 months ago

how about this my method got wrong

    def merge_pdfs(self, file_list: List[Union[Path, str]]):
        if len(file_list) == 1:
            self.merge_doc = fitz.open(file_list[0])
        else:
            self.merge_doc = fitz.open()
            merged_toc = []
            for file in tqdm(file_list, desc="merge", unit="file"):
                pdf = fitz.open(file)
                merged_toc.append((1, Path(file).stem.replace("\u3000", ""), len(self.merge_doc) + 1))
                self.merge_doc.insert_file(pdf)
                pdf.close()
            self.merge_doc.set_toc(merged_toc, collapse=0)
            self.merge_doc.save(self.out_pdf_file)
        self.add_metadata()
        self.set_file_view()
        if self.pagenum:
            self.add_pagnums()
        if self.watermark:
            self.add_watermark(self.watermark_text)
        self.merge_doc.save(self.out_pdf_file)

error message, it stays in 1.24.4, but in 1.24.2 don't have these errors.

..\venv\lib\site-packages\pymupdf\__init__.py:87:exception_info: exception_info:
Traceback (most recent call last):
  File "D:\JDBDocuments\Pycharm\table_name_cards\venv\lib\site-packages\pymupdf\utils.py", line 1444, in set_toc
    txt += ol["dest"]
KeyError: 'dest'
..\venv\lib\site-packages\pymupdf\__init__.py:87:exception_info: exception_info:
Traceback (most recent call last):
  File "D:\JDBDocuments\Pycharm\table_name_cards\venv\lib\site-packages\pymupdf\utils.py", line 1462, in set_toc
    if ol["next"] > -1:
KeyError: 'next'
..\venv\lib\site-packages\pymupdf\__init__.py:87:exception_info: exception_info:
Traceback (most recent call last):
  File "D:\JDBDocuments\Pycharm\table_name_cards\venv\lib\site-packages\pymupdf\utils.py", line 1469, in set_toc
    if ol["parent"] > -1:
KeyError: 'parent'
..\venv\lib\site-packages\pymupdf\__init__.py:87:exception_info: exception_info:
Traceback (most recent call last):
  File "D:\JDBDocuments\Pycharm\table_name_cards\venv\lib\site-packages\pymupdf\utils.py", line 1476, in set_toc
    if ol["prev"] > -1:
KeyError: 'prev'
..\venv\lib\site-packages\pymupdf\__init__.py:87:exception_info: exception_info:
Traceback (most recent call last):
  File "D:\JDBDocuments\Pycharm\table_name_cards\venv\lib\site-packages\pymupdf\utils.py", line 1483, in set_toc
    txt += "/Title" + ol["title"]
KeyError: 'title'

Thank you guys for your hard work

JorjMcKie commented 4 months ago

Apologies - I should have looked into this myself!

YYTB commented 4 months ago

Apologies - I should have looked into this myself!

I tried to fix this in your source code but failed. If it is not too much trouble to ask, can you explain why this happened?

NamelessUzer commented 4 months ago

I also encountered the same problem. It was so easy to reproduce the error by trying to bookmark a PDF file. Here is a code that can reproduce this bug.

import fitz
from pathlib import Path

pdf = Path(r'test.pdf')
toc = [(1, 'level1', 1), (2, 'level2', 1)]
doc = fitz.open(pdf)
doc.set_toc(toc, collapse = 2)
julian-smith-artifex-com commented 4 months ago

I think this may be the same as #3479.

Internal exception diagnostics in utils.py were increased in 1.24.2.

The fix is simple enough. But i'm also looking at writing a test that checks that we don't generate such diagnostics in future.

ChuanPoLee commented 4 months ago

I got the same error message when doing doc.set_toc(toc_list). pymupdf version: 1.24.4 python version: 3.11.7 os: windows

I add key check in utils.py before next 5 variables assign ( I'm not sure if this is a good method.) ol["dest"] (add if 'dest' in ol: ...) ol["next"] ol["parent"] ol["prev"] ol["title"]

julian-smith-artifex-com commented 4 months ago

Fixed in 1.24.5.