messense / mupdf-rs

Rust binding to mupdf
GNU Affero General Public License v3.0
104 stars 23 forks source link

How can I create an indirect reference to an object? #40

Closed Screwtapello closed 2 years ago

Screwtapello commented 2 years ago

I'm trying to write a tool that can read and write PDF metadata.

Reading is easy: the mupdf::Document::metadata() method does exactly the right thing.

Writing is more difficult, since there's no matching set_metadata() method. But that's OK, because mupdf::pdf::PdfDocument lets us poke at the object graph directly!

The PDF 1.7 spec (in section 7.5.5) says that the metadata dictionary — I'm sorry, "Document Information Dictionary" — is stored under the "Info" key in the file trailer. That's easy enough, PdfDocument::trailer() gives us the trailer dictionary, and PdfObject::get_dict() lets us grab the "Info" key. However, the spec goes on to describe the value associated with that key as:

(Optional; shall be an indirect reference)

That is, the key may be missing, but if it's present it must not be an actual dict, but an indirect reference to a dict. If I want to set a key in the metadata dict, and the dict does not exist, I'll have to create it. But I can't just call PdfDocument::new_dict(), I have to create a new entry in the xref table and get the number of that entry, call PdfDocument::new_indirect() to create a reference, and insert that into the trailer dict.

The tricky bit there is "create a new entry in the xref table and get the number of that entry". Looking at the PyMuPDF bindings, they do a little dance with APIs like get_new_xref() and update_object():

https://github.com/pymupdf/PyMuPDF/blob/b929ea717c2e7e1e31b7e8aa20c711c30078841d/fitz/utils.py#L1161-L1163

These functions or methods don't appear to be in the Rust bindings, and I'm not sure whether they're weird extra things PyMuPDF added, or if they're just not wrapped by the Rust bindings.

Is there something I'm missing? Is it just not (yet) possible to do this through mupdf-rs? If it's not yet possible, would it be difficult to add?

messense commented 2 years ago

Sorry for the late reply, it's missing because nobody needed it before. Feel free to open a PR to add a new API for it.