tectonic-typesetting / tectonic

A modernized, complete, self-contained TeX/LaTeX engine, powered by XeTeX and TeXLive.
https://tectonic-typesetting.github.io/
Other
3.95k stars 160 forks source link

Add support for `pdfx` #838

Open Neved4 opened 2 years ago

Neved4 commented 2 years ago

Description

The pdfx allows to produce PDF/X and PDF/A, yet it requires passing special options to compile correctly in XeTeX-based engines. The issue is well-known and documented in the package manual (see 3.1.1. Limitations using XeLATEX), so documents can be compiled with:

$ xelatex -shell-escape -output-driver="xdvipdfmx -z 0" <filename>.tex

Currently, people who want to produce PDF/X and PDF/A should try their luck with another package like hyperxmp. People who want to use pdfx are limited to TeX Live's XeLaTeX, adding extra flags and options or just use LuaLaTeX, where no special flags are needed.

https://github.com/tectonic-typesetting/tectonic/pull/708 added support for -Z shell-escape, which put us halfway there, but tectonic only produces the extended XDV format https://github.com/tectonic-typesetting/tectonic/issues/824#issuecomment-941862219. Is there a way that tectonic behaviour could be modified or a new unstable flag to support this?

Steps to reproduce

test-pdfx.tex:

\documentclass{report}
\usepackage{pdfx}

\begin{document}
  This is a test for pdfx.
\end{document}

Compile with:

$ tectonic -p test-pdfx.tex

Error message

error: pdfx.sty:1285: Package pdfx Error: CreationDate is not properly supported;
vlasakm commented 2 years ago

Please note that tectonic is no different from normal xetex - this TeX engine produces only .xdv (extended DVI) files by itself. But usually behind the scens the file is automatically processed by xdvipdfmx to produce a PDF file.

It seems that the pdfx package itself hasn't been updated for new XeTeX versions, since XeTeX since TeX Live 2019 has the primitive \creationdate. That means that we can easily get around the -shell-escape part, since that is actually used to write and execute a Lua file (with texlua, which isn't provided with tectonic) just to get the current date...

\let\pdfcreationdate=\creationdate
\documentclass{report}
\usepackage{pdfx}

\begin{document}
  This is a test for pdfx.
\end{document}

Now, the xdvipdfmx -z 0 disables PDF compression for all of the PDF file, which is kind of overkill if it is needed only by one object (I don't know if this option propagates also to images, etc.). But! It seems that dvipdfmx already accommodates this case and doesn't compress XMP metadata! Already since TeX Live 2016. Which means that the above should work automagically even with tectonic.

Unfortunately I get this:

Running xdvipdfmx ...
[1
warning: File "sRGB.icc" not found.
warning: File "sRGB.icc" not found.
warning: Interpreting special command fstream (pdf:) failed.
warning: Interpreting special command fstream (pdf:) failed.
warning: >> at page="1" position="(133.768, 667.198)" (in PDF)
warning: >> at page="1" position="(133.768, 667.198)" (in PDF)
warning: >> xxx "pdf:fstream @colorprofile (sRGB.icc) <</N 3 /Alternate/Devic..."
warning: >> xxx "pdf:fstream @colorprofile (sRGB.icc) <</N 3 /Alternate/Devic..."
warning: >> Reading special command stopped around >> <</N 3 /Alternate/DeviceRGB>><<
warning: >> Reading special command stopped around >> <</N 3 /Alternate/DeviceRGB>><<
warning: Could not find any valid object.
warning: Could not find any valid object.
warning: File "pdfa.xmpi" not found.
warning: File "pdfa.xmpi" not found.
warning: Interpreting special command fstream (pdf:) failed.
warning: Interpreting special command fstream (pdf:) failed.
warning: >> at page="1" position="(133.768, 667.198)" (in PDF)
warning: >> at page="1" position="(133.768, 667.198)" (in PDF)
warning: >> xxx "pdf:fstream @pdfx@Metadata (pdfa.xmpi) << /Type /Metadata /S..."
warning: >> xxx "pdf:fstream @pdfx@Metadata (pdfa.xmpi) << /Type /Metadata /S..."
warning: >> Reading special command stopped around >> << /Type /Metadata /Subtype /XML >> <<
warning: >> Reading special command stopped around >> << /Type /Metadata /Subtype /XML >> <<
]
warning: Object @colorprofile used, but not defined. Replaced by null.
warning: Object @colorprofile used, but not defined. Replaced by null.
warning: Object @pdfx@Metadata used, but not defined. Replaced by null.
warning: Object @pdfx@Metadata used, but not defined. Replaced by null.

Writing `test-pdfx.pdf` (4.80 KiB)
Writing `pdfa.xmpi` (4.92 KiB)

sRGB.icc is already known and downloaded by tectonic, I don't know why it can't find it. pdfa.xmpi also seems like file finding issue, although this one is generated by the TeX process and I don't know how the intermediate files are kept between xetex / xdvipdfmx passes in tectonic.

@mnrvwl You can atleast try to submit PDFs generated by xelatex test-pdfx.tex from TeX Live to some validators, to see if getting the above errors fixed in tectonic makes pdfx usable for you.

pkgw commented 2 years ago

Hm, this is interesting! In my day job I was just having a conversation about the difficulty of producing PDF/A files using freely-available tools, which is an issue for at least some researchers whose funders require them to deposit PDF/A versions of their articles in archival repositories.

If the issue is disabling compression in the xdvipdfmx stage, that is very fixable — this mode is extensively exercised in Tectonic's test framework because compression results are not typically portable between platforms. I don't think that we currently have a way to enable this mode in the V1 or V2 interfaces, but it would not be hard to add.

@vlasakm intermediate files are all kept between the xetex and xdvipdfmx passes, so that shouldn't be an issue. I note that the pdfa.xmpi file is being written at the end of the xdvipdfmx pass ... do we need to run it twice??

Finally, to provide a truly smooth user experience, it sounds like we would need to patch the pdfx package, but there are more and more cases arising where I think that we are going to need to patch up the TeXLive sources, so that isn't a huge issue in and of itself.

vlasakm commented 2 years ago

In this case, the pdfx package should be updated in upstream, its suggestions for ordinary XeLaTeX users are outdated, as is its \pdfcreationdate primitive check. If we wanted to help in the transition time, we could just do the \let\pdfcreationdate=\creationdate internally in the engine.

The uncompressed PDF while not necessary here, would be a nice option in general, although I am not sure what form should it have. Seems that just figuring out how to get the user's preference instead of false like I have below should be all..

--- a/src/driver.rs
+++ b/src/driver.rs
@@ -1820,6 +1820,8 @@ impl ProcessingSession {

             engine.build_date(self.build_date);

+            engine.enable_compression(false);
+
             if let Some(ref ps) = self.unstables.paper_size {
                 engine.paper_spec(ps.clone());
             }
pkgw commented 2 years ago

Yeah. I am guessing that if pdfx upstream is that out-of-date, it probably won't be very responsive to proposed updates, but that might not be a correct assumption.

It would be straightforward to add a compression flag to Tectonic.toml. But, depending on what other settings need to be modified in the processing, it might make more sense to add a pdf/a output format that not only disables compression, but also adjusts everything else at the same time.

vlasakm commented 2 years ago

Hm, I checked and found a report of two bug to pdfx to the pdftex mailing list. No reply from the maintainers. So probably no big chances of merging docs / code fixes.

To be honest I never needed PDF/A or PDF/X so I am not sure about the details, but I imagine as with the accessibility (tagging) features this seems to me that would really benefit from LuaTeX, i.e. more thorough approach, rather than the limited possibilities offered by TeX macros.

And I don't see reason for efforts to improve Tectonic's specific abilities to output PDF format variations like PDF/A and PDF/X. If you mean pdf/a as in tectonic tries to help with reproducibility, then yes, maybe but that should be more about other things, not about compression and having it as a separate output format doesn't seem right to me.

I just tried on a simple example, and the disabling of PDF compression seems to disable compression for everything. Due to how compression works in PDF, this also means uncompressed images. Yes this means raw uncompressed pixel data, e.g. my 162K PNG screenshot results in a 6.0 MiB PDF file (1920*1080*3 bytes for the image). Normally this would also apply to JPG files, but dvipdfmx seems to handle them differently (so they are kept compressed).

vlasakm commented 1 year ago

AFAICT #953 will solve the file loading issue. So the following compiles correctly:

% test-pdfx.tex
\let\pdfcreationdate=\creationdate
\documentclass{report}
\usepackage{pdfx}

\begin{document}
  This is a test for pdfx.
\end{document}
$ target/debug/tectonic test-pdfx.tex
Running TeX ...
Rerunning TeX because "test-pdfx.aux" changed ...
Running xdvipdfmx ...
warning: 1024-byte read failed
caused by: failed to fill whole buffer
warning: Could not find any valid object.
warning: 1024-byte read failed
caused by: failed to fill whole buffer
Writing `test-pdfx.pdf` (11.64 KiB)
Skipped writing 3 intermediate files (use --keep-intermediates to keep them)

The problem with no support for XeTeX's \creationdate is in upstream pdfx. Though we may patch either the engine or bundle in some capacity, upstream fix would be preferable IMO. The code concerned with \creationdate in pdfx is more tricky than I am willing to get into now. The right person to contact seems to be Ross Moore.

Neved4 commented 1 year ago

Hey @vlasakm thanks a lot for researching this! You always provide invaluable insight 🖤 I 100% agree pdfx should get an update or two upstream.

Definitely \let\pdfcreationdate=\creationdate fixes the file loading issue. I made a wrapper function in my code using latex3's \sys_if_engine_xetex:T primitive that injects the \let on a successful XeTeX-based engine detection, all good on that front.

I got a bit of time to provide more info on how I use pdfx:

After being loaded, one generally passes options to the package to select a particular PDF/A or PDF/X specification, version of the standard, variant, etc. These can be passed using \PassOptionsToPackage or the usual inlining: \usepackage[a-1b]{pdfx} to make a PDF/A-1b or \usepackage[x-1a][pdfx] to produce a PDF/X1a file (See docs).

Then I use a validator like Adobe Acrobat's Preflight feature. Admittedly it may not be complete and it doesn't even have all PDF/A or PDF/X versions but let's hope —crossing fingers— that the same guys who made the format are also validating it correctly 🤞🏻

I tested your suggestion with xelatex test-pdfx.tex and while the validator passes on XeTeX, it fails on Tectonic using said Preflight feature (picture). It seems that currently xelatex and tectonic are producing slightly different outputs.

I believe is possible to generate apparently legitimate PDF/A and PDF/X using a combination of other means: hyperxmp to add the metadata, xcolor to use ICC and adding the entry manually. Whether pdfx or any of the others could get to false positives that don't really comply I can't tell, because I haven't studied the standards in depth. The pdfx one seemed a comfortable package but I'll check if I can come up with another way that involves multiple packages, manual settings, etc.

MWE

\let\pdfcreationdate=\creationdate

\documentclass{report}
  \usepackage[a-1b]{pdfx}

\begin{document}
  This is a test for pdfx.
\end{document}

How to reproduce

Attachments

tectonic-pdf:a-1b
pkgw commented 1 year ago

Very interesting discussion and great work on this topic! I'm afraid that I just don't have the hours in the day to help solve this problem, but if you have any questions for me I will do my best to answer them.