podofo / podofo

A C++17 PDF manipulation library
https://podofo.github.io/podofo/documentation
Other
362 stars 76 forks source link

Remove annotations from PDF files #2

Open muelli opened 2 years ago

muelli commented 2 years ago

Currently, the best way to get rid of annotations seems to be a series of commands that have to potential to fail for some PDF files: https://gist.github.com/stefanschmidt/5248592

It would be nice if podofo offered a tool to remove annotations from a pdf file.

(Turns out that pdfcpu can remove annotations)

ceztko commented 1 year ago

With PoDoFo-next (soon 0.10). Don't expect new tools now because all tools are currently unsupported. An API it's more interesting.

I write some ideas about a possible API:

PdfAnnotation::Remove()

That it just remove reference from page /Annots

Fields are more interesting. Here we could introduce the following:

PdfField::Remove(PdfFieldRemoveOptions options)

By default, if Widget not null, it should:

  1. Remove field reference from page /Annots

If Widget not null and children /Kids empty it should also:

  1. remove field reference from AcroForm /Fields
  2. Remove this field reference from parent /Kids
  3. Purge parents with empty children list from AcroForm /Fields
  4. Purge ancestors of parent with empty children list, including all hierarchy from AcroForm /Fields

If Widget not null it should do (2),(3),(4),(5) above plus:

  1. Remove reference from all pages /Annots
  2. Remove children recursively from from AcroForm /Fields
  3. Remove children recursively from all pages /Annots

PdfFieldRemoveOptions would be:

enum PdfFieldRemoveOptions
{
    None = 0,
    DontRemoveFromAcroForm,             // Will not do (2) above
    DontRemoveFromParent,               // Will not do (3) above, implies DontPurgeParenFromAcroForm
    DontPurgeParenFromAcroForm,         // Will not do (4) above, implies DontPurgeAncestorsFromAcroForm
    DontPurgeAncestorsFromAcroForm,     // Will not do (5) above
    DontPurgeNonTerminalFromPages,      // Will not do (6) above
    DontPurgeNonTerminalKidsFromPages   // Will not do (7) above
    DontPurgeNonTerminalKidsFromPages   // Will not do (8) above
};

These are just ideas. At some time I could implement such API, but I can't tell any timing.