pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.65k stars 526 forks source link

Content of dict returned by doc.embfile_info() does not fit to documentation #4050

Open dasmy opened 4 days ago

dasmy commented 4 days ago

Description of the bug

Using doc.embfile_add(), I added an attachment to a pdf file. Then, I retrieved its metadata using doc.embfile_info().

In contrast to the documentation at https://pymupdf.readthedocs.io/en/latest/document.html#Document.embfile_info, the resulting dict does not contain a desc field for the description. Instead, the description is found under the key descender:

print(doc.embfile_info(name))
{'name': 'pdf_scraper_markdown_content', 'collection': 0, 'filename': 'fulltext.md', 'ufilename': 'fulltext.md', 'descender': 'Markdown representation of the PDF file content.', 'size': 67, 'length': 67, 'creationDate': "D:20241114103806+02'00'", 'modDate': "D:20241114103806+02'00'"}

How to reproduce the bug

With docbeing a pymupdf.Document, just call

doc.embfile_add('test', b'foobar', desc='some text')
print(doc.embfile_info('test'))

Result:

{'name': 'test', 'collection': 0, 'filename': 'test', 'ufilename': 'test', '**descender**': 'some text', 'size': 6, 'length': 6, 'creationDate': "D:20241114105413+02'00'", 'modDate': "D:20241114105413+02'00'"}

PyMuPDF version

1.24.13

Operating system

MacOS

Python version

3.12

JorjMcKie commented 4 days ago

The problem was overwriting a dictionary key value. Easy fix. Of course, "description" is correct, which will be reflected in the returned dictionary and the documentation likewise.

dasmy commented 4 days ago

Nice. That was quick 😀