sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.63k stars 96 forks source link

Support reproducible PDF builds #949

Open alerque opened 4 years ago

alerque commented 4 years ago

I've thought of this many times before but never got up the courage to report the issue and make it real ;-) I've been reminded from pandoc/issues/6539 that reproducible PDFs are possible (if rare in the wild) and have a lot of uses.

Long story short, SILE should support this somehow. Maybe even by default if we can.

alerque commented 4 years ago

Minor detail, but with reproducible builds is mind as a long term goal I was wondering about the Producer meta data field. It is currently hard coded to be just "SILE", but in conjunction with #1036 I wanted to add the version string there so I could review old PDFs and know exactly what version of SILE I used to generate them. I can usually infer it as the Git HEAD version dated at the same time that commit was in my sources, and my documents usually have a SHA of the commit they were built from, but this is more trouble than it needs to be.

Would limiting 100% reproducibility to exact version matches be a good thing or a bad thing?

ctrlcctrlv commented 4 years ago

Good thing

alerque commented 4 years ago

Thanks @ctrlcctrlv. After thinking about it for the weekend I think I agree. My logic is that there are cases where both behaviors would be desirable, but the ones where adding the version info will make life harder all involve potential ways of regression testing SILE itself. For that case it would be pretty simple to stuff an artificial value in there (since it's just reading the Lua string). Later if need be we can provide options for not using any meta data or otherwise normalizing them (for example to get the date fields the same).

In almost all other cases requiring same versions of SILE as a pre-requisite to identical PDFs seems quite reasonable. I've added the version string to the producer field via the PR above.

hendursaga commented 1 year ago

This would be especially useful for when SILE is available on GNU Guix!

alerque commented 1 year ago

@hendursaga I'd be happy to see Guix packaging, and that can happen any time. As I'm not an active user I'm probably not the one to actually make it happen, but if you or somebody else is interested I'd be happy to facilitate through the packaging process. Also note the Nix packaging has some similarities to Guix in being able to programmatically define a reproducible package environment. Also both Nix and Arch Linux packaging are provably reproducible on the packaging end (obviously not yet SILE's output per this issue, just noting for Guix sake that reproducible packaging is already supported).

horo-fox commented 2 months ago

I poked at this and the big obstacle is that libtexpdf gives every font a ... randomly assigned unique tag. So the font names become PPKMZU+LinBiolinumO. I don't see a simple way around this because font names show up so much. I'm not clear on why libtexpdf assigns the unique tags either, so it's possible any changes I make might break things subtly.

alerque commented 2 weeks ago

@horo-fox Interesting piece of info. At a guess is that because fonts get subsetted and embedded on a per-document bases and hence need unique identifiers. If we could find where those identifiers were being generated we could probably come up with something to make it deterministic, and presumably if we catch the generation point everything downstream will play along nicely.

As a side note the PDF writer backend for Typst does support reproducible output, although it is missing a few other features still. Writing an alternative backend using that instead of libtexpdf is going to be on the table in not too long. Any contributions towards making libtexpdf build and install properly as a stand alone library instead of a submodule of this project so that it could be made an optional dependency would be welcome, as would any work on a new optional PDF outputter backend in Rust.