tectonic-typesetting / tectonic

A modernized, complete, self-contained TeX/LaTeX engine, powered by XeTeX and TeXLive.
https://tectonic-typesetting.github.io/
Other
3.99k stars 162 forks source link

nix derivation for all build dependencies #898

Open cmoog opened 2 years ago

cmoog commented 2 years ago

The existing nix package only includes the CLI itself. But, for hermetic nix builds of latex documents, we'd need to have an additional derivation that includes the necessary latex packages. This way, we can have reproducible builds of documents without network dependencies.

Is this possible with the current architecture? I imagine we'd just need to have a derivation with every package, then the user would override TECTONIC_CACHE_DIR during the build step to point at the cache derivation.

Thoughts?

pkgw commented 2 years ago

Hmm. I'm not really familiar with how nix does things, so I may be missing a few items here, but I'll do my best to answer.

The most direct way to get a fully network-free document build would be to provide a local copy of the bundle file (several gigabytes), and point your document builds at that, using the -b argument in the V1 CLI. There is a wrinkle here because the -b argument expects a bundle stored in a Zip file, while the online bundle is in a different "indexed tar" format, but one can generate one from the other, or I could just upload the Zip version to our cloud storage. The other wrinkle I can think of is that I think the V2 CLI doesn't support local bundle files, but that would be pretty easy to add support for.

To get a build that didn't rely on all of those gigabytes of data, you could do something like bootstrap a build as above on a clean cache, then export the populated cache directory in some fashion, then "import" using $TECTONIC_CACHE_DIR or some kind of manual copy.

Finally, I'll mention that the repo tectonic-texlive-bundles contains the infrastructure used to generate those bundle files from the TeXLive upstream, which is done by doing a bunch of processing in Docker containers that point to a checkout of Norbert Preining's Git mirror of the TeXLive SVN repo (which is like 60 gigs or something silly since they commit oodles of binaries to the SVN).

Does that help?

Neved4 commented 2 years ago

ping @cmoog

cmoog commented 2 years ago

Thanks for this detailed response @pkgw, great info here. The trouble here is that (to my knowledge) tectonic doesn't provide an easily parsable lockfile from which a nix expression could parse and download the minimum set of required dependencies. The next best solution would be a way to generate a nix expression similar to node2nix, but even that would require hooking into tectonic dependency parsing/resolution logic.

Finally, you're right that downloading the entire archive of all dependencies would work. I agree that the quickest solution would be a nix derivation that contains a populated cache dir, which could be used at build time by setting $TECTONIC_CACHE_DIR to the derivation path in the nix store.

pkgw commented 2 years ago

Ah, yes. Right now Tectonic doesn't have anything like a lockfile because it doesn't manage dependencies and packages in a fine-grained manner — during document builds, there's no dependency resolution; all Tectonic does is pull files from the bundle upon request. The bundle is built from TeXLive packages but the information about specific packages is (intentionally) erased once the bundle is assembled. (Sorry, I feel like I'm not explaining clearly here.)

Tectonic could definitely emit a very simpleminded "lockfile" with the list of files needed to build a given document. That could be used to pull down the subset of files from the bundle needed to build that document without the network.

From some extremely superficial looking at what node2nix does, I think one question would be whether the fetchurl fetcher supports HTTP byte-range requests. If it does, I think we could use that sort of lockfile to create a Nix expression that depended on only the pieces of the bundle required by the specific document.

If it doesn't, one could get something to work by having a fetchurl expression that depends on the whole bundle. That could be combined with a lockfile, but as long as you're pulling down the whole bundle anyway the details of the lockfile aren't saving you any work.

So I guess either way, the tectonic CLI program would need to provide whatever low-level operations would be required to go from these sorts of fetchurl inputs to a set-up cache to supporting a build, I think?

stephen-huan commented 2 months ago

Even without automation I think the tectonic situation is quite a bit better than the existing state of the art for reproducible LaTeX builds with Nix. At the most granular, one just uses pkgs.texlive.combined.scheme-full, and at the most fine-grain, one uses pkgs.texlive.combined.scheme-minimal and manually specifies the packages they need. However, this is tedious and it is often not clear which packages provide which required files.

With tectonic, at the most granular, one can fetchzip the bundle url (of course, incurring a ~2.8 GB download). But for fine-grain dependency management, one can semi-automatically use the recipe provided in https://github.com/tectonic-typesetting/tectonic/issues/977#issuecomment-1345192682 which is much less tedious than manual specification. If one wishes to avoid this, tectonic itself can be used as the dependency resolver with a fixed-output derivation as demonstrated here. The disadvantage is that the build happens twice.