openpaperwork / paperwork

Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/paperwork
2.43k stars 149 forks source link

Allow pdf manipulation #495

Open Lucki opened 7 years ago

Lucki commented 7 years ago

Allow reordering, extruding and inserting pdf pages to documents like scanned pages. I can drag'n'drop scanned pages around but this doesn't work with pdf yet.

jflesch commented 7 years ago

The main problem here is that I designed Paperwork so it never modifies the PDF files (just copy and rename). It's not just because they are painful to modify, it's also because losing metadatas could happen too easily. My main worry is regarding digital signatures : A signed PDF can be modified, but it can't be re-signed and therefore the signature will be lost.

So yes, in theory, it's perfectly do-able. However, a compromise has to be found between keeping the metadatas, allowing the user to do what they want, and keeping a good user interface and user experience.

jflesch commented 7 years ago

Note to myself: Maybe just keep track of the manipulations requested by the user, and reapply them every time we open the document, but only for displaying ? (+ export)

tYYGH commented 7 years ago

Good idea. It’s like what the video editors do: they do not alter the sources, but remember instead what must be applied to the sources, and when, to produce the result.

mjourdan commented 7 years ago

XMP could fit our needs here?

jflesch commented 7 years ago

Hm, not sure. Modifying XMP metadata in PDF files may still have side effects on them (signatures, etc).

There is another way, that would take care of caching as well.

When we import a PDF, we get the following structure:

papers/
|-- 20140804_2127_12/
    |-- doc.pdf
    |-- labels

When a page is modified, we could use a mix of the image document structure and the PDF structure:

papers/
|-- 20140804_2127_12/
    |-- doc.pdf
    |-- labels
    |-- paper.3.jpg  # the modified page

The PDF would remain available untouched.

Then, for the whole document, there could then be 2 export options instead of one: