speedata / publisher

speedata Publisher - a professional database Publishing system
https://www.speedata.de/
GNU Affero General Public License v3.0
292 stars 36 forks source link

`<PDFOptions creator="" />` and `--suppressinfo` #420

Closed pr-apes closed 1 year ago

pr-apes commented 1 year ago

@pgundlach,

I would like to generate PDF documents with no data info (so that they have the same SHA, given the same content) that may contain PDF creator.

There is the --suppressinfo option that does this in part, since it suppresses the creator metadatum, but no other metadata (sorry, but I cannot understand the reason for that),

A workaround for this would be the following patch:

--- ../old-spp/sw/lua/publisher.lua 2022-08-25 14:39:43.000000000 +0200
+++ sw/lua/publisher.lua    2022-08-29 14:00:27.192635400 +0200
@@ -877,9 +877,7 @@
 end

 local function getcreator()
-    if sp_suppressinfo then
-        return "speedata Publisher, www.speedata.de"
-    elseif options.documentcreator and options.documentcreator ~= "" then
+    if options.documentcreator and options.documentcreator ~= "" then
         return options.documentcreator
     else
         return string.format("speedata Publisher %s, www.speedata.de",env_publisherversion)
@@ -1417,9 +1417,9 @@
     local creator = getcreator()
     local infos
     if sp_suppressinfo then
-        infos = { "/Creator (speedata Publisher) /Producer (LuaTeX)"}
+        infos = { string.format("/Creator %s /Producer (speedata Publisher + LuaTeX, www.speedata.de) ",utf8_to_utf16_string_pdf(creator)) }
     elseif options.documentcreator and options.documentcreator ~= "" then
-        infos = { string.format("/Creator %s /Producer (speedata Publisher %s using LuaTeX) ",utf8_to_utf16_string_pdf(creator),env_publisherversion) }
+        infos = { string.format("/Creator %s /Producer (speedata Publisher %s + LuaTeX, www.speedata.de) ",utf8_to_utf16_string_pdf(creator),env_publisherversion) }
     else
         infos = { string.format("/Creator (%s) /Producer (LuaTeX %s (build %s))",creator, luatex_version, status.development_id or "-") }
     end

Other approach would be to limit --suppressinfo to /CreationDate, /ModDate and trailer IDs, adding --suppressmetadata option.

If you like the idea of a --suppressmetadata option, I could provide it and it would ignore all metadata from <PDFOptions>.

I think the option included in the code above is simpler.

BTW, I took the liberty to include the URL from speedata.

Let me know what you think about this.

pgundlach commented 1 year ago

There is the --suppressinfo option that does this in part, since it suppresses the creator metadatum, but no other metadata (sorry, but I cannot understand the reason for that),

~It is no problem for the user not to set the optional metadata (or keep it the same in all the runs), but it is/was not possible to suppress the generation of a random document id and timestamp.~

I see what you mean. Ignore my comment.

pgundlach commented 1 year ago

I prefer to keep

<<
  /Creator (speedata Publisher)
  /Producer (LuaTeX)
  /Trapped /False
>>

as the default for -s, so the existing files don't need to get regenerated.

pr-apes commented 1 year ago

Would it be fine for you to have a --suppressdates option that achieves the same as the -s option, except for the /Creator (and I'd rather say /Producer part?

I mean, --suppressdates would make reproducible PDF documents (and if you agree it might contain /Producer (speedata Publisher + LuaTeX, www.speedata.de)) with exactly the same hash function, but with also the /Creator metadata.

I think I can provide the PR, but it doesn't make sense if you don't think it makes sense to you.

Many thanks for your help.

pgundlach commented 1 year ago

What about

a) when --suppressinfo is used but creator is not set, the defaults from above are used b) when --suppressinfo is used and creator is set, the creator overrides the (speedata Publisher)

That way we don't need yet another option. Or am I missing something?

pr-apes commented 1 year ago

This is a way to implement your proposal (when I'm not missing your point myself):

--- ../old-spp/sw/lua/publisher.lua 2022-08-25 14:39:43.000000000 +0200
+++ sw/lua/publisher.lua    2022-09-05 09:02:21.119850400 +0200
@@ -877,10 +877,10 @@
 end

 local function getcreator()
-    if sp_suppressinfo then
-        return "speedata Publisher, www.speedata.de"
-    elseif options.documentcreator and options.documentcreator ~= "" then
+    if options.documentcreator and options.documentcreator ~= "" then
         return options.documentcreator
+    elseif sp_suppressinfo then
+        return "speedata Publisher, www.speedata.de"
     else
         return string.format("speedata Publisher %s, www.speedata.de",env_publisherversion)
     end
@@ -1416,10 +1416,10 @@
     -- Keywords  Keywords associated with the document.
     local creator = getcreator()
     local infos
-    if sp_suppressinfo then
-        infos = { "/Creator (speedata Publisher) /Producer (LuaTeX)"}
-    elseif options.documentcreator and options.documentcreator ~= "" then
+    if options.documentcreator and options.documentcreator ~= "" then
         infos = { string.format("/Creator %s /Producer (speedata Publisher %s using LuaTeX) ",utf8_to_utf16_string_pdf(creator),env_publisherversion) }
+    elseif sp_suppressinfo then
+        infos = { "/Creator (speedata Publisher) /Producer (LuaTeX)"}
     else
         infos = { string.format("/Creator (%s) /Producer (LuaTeX %s (build %s))",creator, luatex_version, status.development_id or "-") }
     end