kjarosh / latex-docker

A set of lightweight Docker images for building LaTeX documents
MIT License
30 stars 3 forks source link

Is TL_YEAR really needed? #1

Closed paloha closed 1 year ago

paloha commented 1 year ago

First, thanks for making this repo, it helped me a lot today.

I would like to ask one question about the TL_YEAR=2023 constant in the Dockerfile. In the Readme, you say the correct year is necessary for tlmgr to work. However, I tried to remove the constant completely and the tlmgr still works. To explain why I am asking - I have to adjust your Dockerfile to fit my needs and my project will be deployed with CD. So I am trying to prevent any issues that could arise when the year changes.

Could you please explain why do you think the tlmgr won't work or have I misunderstood something? To be more explicit, here is how I adjusted your code and the image builds successfully:

FROM python:3.8.16-alpine3.18
ARG TL_MIRROR="https://texlive.info/CTAN/systems/texlive/tlnet"

RUN apk add --no-cache perl curl fontconfig libgcc gnupg && \
     mkdir "/tmp/texlive" && cd "/tmp/texlive" && \
     wget "$TL_MIRROR/install-tl-unx.tar.gz" && \
     tar xzvf ./install-tl-unx.tar.gz && \
     ( \
         echo "selected_scheme scheme-minimal" && \
         echo "instopt_adjustpath 0" && \
         echo "tlpdbopt_install_docfiles 0" && \
         echo "tlpdbopt_install_srcfiles 0" && \
         echo "TEXDIR /opt/texlive" && \
         echo "TEXMFLOCAL /opt/texlive/texmf-local" && \
         echo "TEXMFSYSCONFIG /opt/texlive/texmf-config" && \
         echo "TEXMFSYSVAR /opt/texlive/texmf-var" && \
         echo "TEXMFHOME ~/.texmf" \
     ) > "/tmp/texlive.profile" && \
     "./install-tl-"*"/install-tl" --location "$TL_MIRROR" -profile "/tmp/texlive.profile" && \
     rm -vf "/opt/texlive/install-tl" && \
     rm -vf "/opt/texlive/install-tl.log" && \
     rm -vrf /tmp/*

ENV PATH="${PATH}:/opt/texlive/bin/x86_64-linuxmusl"

# Change all TL_SCHEME_XXX to "y" if you want it.
ARG TL_SCHEME_BASIC="y"
RUN if [ "$TL_SCHEME_BASIC" = "y" ]; then tlmgr install scheme-basic; fi

ARG TL_SCHEME_SMALL=""
RUN if [ "$TL_SCHEME_SMALL" = "y" ]; then tlmgr install scheme-small; fi

ARG TL_SCHEME_MEDIUM=""
RUN if [ "$TL_SCHEME_MEDIUM" = "y" ]; then tlmgr install scheme-medium; fi

ARG TL_SCHEME_FULL=""
RUN if [ "$TL_SCHEME_FULL" = "y" ]; then tlmgr install scheme-full; fi

RUN tlmgr install ragged2e

Thank you in advance for your answer.

kjarosh commented 1 year ago

I'm really glad you are using my work and it's helped you!

Regarding your question, I think you have misunderstood the statement. The statement "In order for tlmgr to work, the year must be current." talks about the year in the image version (i.e. kjarosh/latex:2021.2), not TL_YEAR. It is not a problem of building the image, but rather of using images built in the past. tlmgr connects to CTAN, but CTAN hosts only the current version of TeX Live. If the tlmgr used comes from an older version of TeX Live, it will not work with the current CTAN repository, because the repository does not host older versions of packages.

In order to illustrate this issue, try running the following commands. First, start a container using an older version of my image, e.g.

docker run -it --rm kjarosh/latex:2021.2-minimal sh

Within the container, try installing a package using tlmgr:

tlmgr install latexmk

You should get an error:

tlmgr: Local TeX Live (2021) is older than remote repository (2023).
Cross release updates are only supported with
  update-tlmgr-latest(.sh/.exe) --update
See https://tug.org/texlive/upgrade.html for details.

This is why when using older releases of TeX Live, it is advised to use the full version, so that all packages are available and no packages need to be installed. It probably would be possible to make tlmgr work for older releases, but you would need to specify a repository containing an archive of the given version (for each version separately).

Regarding your modification of removing TL_YEAR, you are right—it may easily be removed, however it does not solve the issue. Images built for version 2023 will not work when version 2024 is released, and versions built for 2022 would not have worked now. The only difference is that you do not need to parameterize the image with the version.

Actually, one may argue whether TL_YEAR should be kept or not. There are pros and cons of both approaches. When TL_YEAR is not present, the build works all the time, but it is not reproducible. It means that the same Dockerfile will produce different images when built at different times. TL_YEAR does not guarantee full reproducibility of builds (the distribution may change within a single major release), but at least the same Dockerfile will not produce distributions for different years. In order to guarantee full reproducibility, probably a hash of the whole package should be added and verified when building the image. Reproducibility of builds is important in Docker images, because it is an expected behavior—e.g. consider image caching in Docker. Unfortunately, TeX Live provides only the newest distribution, so reproducibility relies on failing the build when a different version is available, which is not ideal. I should probably rethink this issue a bit and decide on either full reproducibility, no reproducibility at all, or maybe using some archives for older releases.

Nonetheless, I should probably reword the statement of tlmgr not working in the README (or even elaborate), so that it would be easier to understand.

paloha commented 1 year ago

@kjarosh thanks for your comprehensive answer. All is clear now. The fact that TeX Live provides only the newest distribution does indeed make the decisions regarding reproducibility more difficult. As you said, maybe the sentence in the README that comments on tlmgr could be a bit more verbose and clear. However, as I said earlier, I had to adjust your container anyways. Therefore, in my own builds I will take your answer into consideration and deal with this issue on my own. Thank you.

kjarosh commented 4 weeks ago

@paloha so I finally found a moment and resolved this issue. TL_YEAR is not needed anymore as tlnet archive and historical mirrors are now used.

The tlmgr should now be working at all times for all the latest stable releases (2024.3, 2023.3, 2022.2, etc.) as well as daily releases (devel-2024-20240823, devel-2024, etc.). The updated README contains a short description of the new approach.

You can look at the following files for mirrors:

https://github.com/kjarosh/latex-docker/blob/af816b76dae291ce1cb027b77bd7ba4740be5528/MIRRORS#L1-L29

https://github.com/kjarosh/latex-docker/blob/af816b76dae291ce1cb027b77bd7ba4740be5528/.github/workflows/configure_from_date.py#L8