Closed pr-apes closed 1 year ago
@pgundlach.
I'm going to submit another pull request, that supersedes both #432 and #433.
--mainlanguage
value is also processed in the new pull request.
I hope it helps.
There is an issue with language identifiers: they contain hyphens and not underscores.
According to https://www.rfc-editor.org/rfc/rfc3066#section-2.1:
The syntax of this tag in ABNF [RFC 2234] is:
Language-Tag = Primary-subtag *( "-" Subtag )
Primary-subtag = 1*8ALPHA
Subtag = 1*8(ALPHA / DIGIT)
The productions ALPHA and DIGIT are imported from RFC 2234; they denote respectively the characters A to Z in upper or lower case and the digits from 0 to 9. The character "-" is HYPHEN-MINUS (ABNF: %x2D).
Although the PDF puts en-GB
and es-MX
as examples for locales, they don't seem to be recognized as main languages for the document. es-ES
, en-US
(or de-DE
[but not de-AT
or de-CH
]) are valid values.
That being said, the default language is en_GB
.
It could be discussed if the /Lang
attribute in the PDF catalog should be automatically set by the default language of the document.
This could be a solution:
/Lang (en)
in the catalog/Lang (de)
in the catalog.Does this sound ok?
I think there are different questions here:
es_ES
or en_UK
are not valid values for languages [as /Lang
requires them]).<PDFOptions lang="…" />
when <Options mainlanguage="…"> and
--mainlanguage` are already available.In my opinion, the easiest way to avoid both issues (Acrobat not recognizing values with hyphens and invalid values with underscores) would be to read just before the hyphen or underscore (as you propose in your first two items [if I'm not missing your point]).
BTW, I don't know whether your reply here was written after https://github.com/speedata/publisher/pull/435#issuecomment-1253359473.
After writing this reply, I'm going to reply #435.
I think this should be enough, unless I've missed something:
diff --git a/src/lua/publisher.lua b/src/lua/publisher.lua
index 76dd131a..114ce00c 100644
--- a/src/lua/publisher.lua
+++ b/src/lua/publisher.lua
@@ -1388,6 +1388,11 @@ function initialize_luatex_and_generate_pdf()
if str then
pdfcatalog[#pdfcatalog + 1] = str
end
+ local langtbl = get_language(defaultlanguage)
+
+ if langtbl and langtbl.locale then
+ pdfcatalog[#pdfcatalog+1] = string.format(" /Lang (%s)",string.gsub(langtbl.locale,"^(%a+).*","%1"))
+ end
local vp = {}
if viewerpreferences.numcopies and viewerpreferences.numcopies > 1 and viewerpreferences.numcopies <= 5 then
@@ -1458,7 +1463,7 @@ function initialize_luatex_and_generate_pdf()
pdfcatalog[#pdfcatalog + 1] = string.format("/OutputIntents %d 0 R",outputintentsarrayobjnum )
end
if options.format == "PDF/UA" then
- pdfcatalog[#pdfcatalog + 1] = string.format("/Lang (de) /MarkInfo << /Marked true >> ")
+ pdfcatalog[#pdfcatalog + 1] = string.format(" /MarkInfo << /Marked true >> ")
metadataobjnum = pdf.obj({ type="stream", string = getuametadata(), immediate = true, attr = [[ /Subtype /XML /Type /Metadata ]],compresslevel = 0,})
vp[#vp + 1] = "/DisplayDocTitle true"
This is the way to go.
Many thanks for the implementation.
@pgundlach,
432 and #433 (sorry, but I don't know how to submit a single pull request) handle
/Lang
more gracefully:<PDFOptions format="PDF/UA"/>
.Options mainlanguage=""/>
.--mainlanguage
argument in the command line invocation.I hope it helps.