speedata / publisher

speedata Publisher - a professional database Publishing system
https://www.speedata.de/
GNU Affero General Public License v3.0
292 stars 36 forks source link

read metadata from PDF #442

Closed pr-apes closed 1 year ago

pr-apes commented 1 year ago

@pgundlach,

would it be possible that Publisher could read from other PDF documents the metadata it can set?

I mean the following functions:

  1. sd:pdf-title().
  2. sd:pdf-subject().
  3. sd:pdf-author().
  4. sd:pdf-creator().
  5. sd:pdf-keywords().

Many thanks for your help.

pr-apes commented 1 year ago

This is only useful not only to import metadata, but to remove some of them in certain cases (names, user IDs...).

pgundlach commented 1 year ago

Reading should be possible, changing an external PDF file sounds like a non-trivial tasks.

Reading could be built in, but can also be achieved by running an external process such as pdfinfo from poppler and parsing the output.

pr-apes commented 1 year ago

Sorry for my poor explanation, since I understand that it makes no sense to pretend that Publisher edits PDF files.

Similar to the recipe from https://doc.speedata.de/publisher/en/cookbook/multipagepdf/, metadata could be also copied (but only when they can be read).

I thought of something similar to this function: https://github.com/speedata/publisher/blob/d40b376ecc1e20db72888230d219d6f591f87d54/src/lua/publisher/layout_functions.lua#L250-L254.

I hope it is clear now.

pgundlach commented 1 year ago

I will close this as I don't see a very easy way to include this. It would be possible with some programming, but for that I need a sponsor.