rffrasca / PDFKeeper

Open Source PDF Document Management
https://www.pdfkeeper.org/
GNU General Public License v3.0
110 stars 11 forks source link

[Feature] Option to Export Attachments / Nested PDFs #33

Open pa-0 opened 1 month ago

pa-0 commented 1 month ago

Is your feature request related to a problem? Please describe. Adobe's Portfolio PDF is incompatible with all viewers save for Adobe reader.

Describe the solution you'd like An easy way to unpack the pdf attachments nested in some PDFs.

rffrasca commented 1 month ago

Are you looking to unpack Portfolio PDFs already uploaded into PDFKeeper or do you need to unpack so you can upload contents within the Portfolio PDF into PDFKeeper?

pa-0 commented 1 month ago

Ideally, both -- (the priority) so that I can share externally without requiring the recipient to download Adobe Reader to view the files and (bonus) so that I can manage those nested files individually within PDFKeeper.

rffrasca commented 1 month ago

For the unpack, I'm thinking when an item is selected in the grid to check if the associated PDF is a portfolio. When the PDF is a portfolio, enable a menu item that can be selected to perform the unpack to a selected disk folder. To make the feature more robust, I can also provide the capability to unpack to a zip file if you see value with that.

Regarding the second part. It sounds like you may be looking for a dialog with a list of child files from the PDF portfolio for the selected grid item. In the dialog, you can select an item from the list and perform functions such as open with default application (when file is not PDF), save to disk, and copy to clipboard?

pa-0 commented 1 month ago

I don't think I could have articulated it better myself! I'm still a beginner, but I'm currently researchng the different methods to accomplish the unpacking part in C#. If I manage to produce something worth sharing, I'll post any updates here.

rffrasca commented 1 month ago

I should be able to start on the unpack feature later this month.

pa-0 commented 1 month ago

I found a couple of projects already that are advertised of having this capability. Of the two I found, one uses iText (not the most ideal licensing terms) and this one that leverages PdfBox which might be more appealing a solution which is described as capable of both attaching and detaching embedded files to existing PDFs.: PDFAttacher with a couple of caveats:

  1. PDFBox dependency is actually a nuget packaged .NET wrapper for the original Java solution, and it does not appear to have been updated since in ten years+
  2. I was able to build the program without issue; however it does not appear to work -- at least not for PDF Portfolio files. This could be a problem with the dependency, or (just as likely) an unsuccessful migration from .NET Framework version 4.6.1 --> 4.8. I have Visual Studio Pro 2022, and upon importing the project, a dialog appeared asking if I'd like to upgrade the project to the newer version of the framework. The process appeared completed without issue, as did the build process. However, testing it by dragging multiple different portfolio PDFs to the form resulted in no attachments being displayed.

All in all, my research thus far does not seem to have been very fruitful. At any rate, I hope this is helpful on some level, at least.

rffrasca commented 1 month ago

I was planning on using iText for this since it is already used in PDFKeeper for PDF text and annotations extraction, reading and writing PDF metadata, and PDF page splitting. Not the easiest product to work with but in my opinion, the "best out there for working with PDF" and is updated quarterly.

rffrasca commented 2 weeks ago

Hi Peter, just wanted to give you an update. So far, I have been able to extract attachments out of a couple of sample PDF's that I found on the internet. These samples share one thing in common, the attachments are defined in the PDF /EmbeddedFiles dictionary.

Here is a screen capture of a sample PDF opened in iText RUPS - https://github.com/itext/i7j-rups/releases/download/7.2.5/itext-rups-7.2.5-exe-archive.zip

image

Using RUPS, can you check the PDF's you're trying to extract from to see if they are structured the same as the sample?

rffrasca commented 1 week ago

Feature for extracting all attachments in a PDF to ZIP file or folder has been committed.