xu-cheng / latex-action

:octocat: GitHub Action to compile LaTeX documents
https://github.com/xu-cheng/latex-action
MIT License
1.11k stars 134 forks source link

feature request: Multithreading (parallel jobs) support #125

Closed whisperity closed 1 year ago

whisperity commented 1 year ago

I use the action script to build my dissertation, which consists of (at this time) 5 top-level .tex files. It will become even more when the defences approach, I will end up with, I think 7 top-level files. However, just specifying the files to the action, the build runs on each root_file sequentially, resulting in 10-12 minutes for a full CI execution (including the "build Docker container" phase as well, which takes <2 minutes).

On my local machine, a -j4 build takes about 4 minutes because, by the time the 1st job building the main document finishes, the other cores have already finished compiling the supplementary material.

Support for parallel builds through the action's interface would help immensely (even though GitHub Actions runners only give you 2 (or 3?) cores, not 4...), especially for private repositories where the time taken for CI jobs translates to money that needs to be paid to GitHub.

xu-cheng commented 1 year ago

While I understand it is desire to speed up the compilation process, I think adding parallel support would make the code unnecessary complex and hard to maintain. Moreover, it would be hard to handle many edge cases, such as multiple processes trying to write to the same temporary files. The reason is that latex and the entire tex system are not built with parallel support in mind. Therefore, I will close this issue for now. If you really need it, I suggest to use texlive-action instead, which allows you to custom the complication process whatever you want.

whisperity commented 1 year ago

@xu-cheng I believe that from your side, adding support for parallel builds (directly by running some number of *tex compilations or by allowing the user to specify a Makefile directly and letting make deal with it...) would not be that significant of a complexity compared to what the script does automatically right now. Ensuring that the parallel builds do not step on each others' toes would still be the user's responsibility (and thus the parallelism an opt-in feature of this Action script!), just like how supporting parallelism for your project is the user's responsibility if they want to do a parallel build locally.

Nevertheless, I've transitioned my project to using the other repository. Well, it was a non-trivial task (due to losing the automatic nice features such as the font install and also the fact that some of the tools used by my make invocations for post-processing are simply not available in Alpine Linux(!)...), but I managed. Oddly enough, I did not really obtain any meaningful speed-up from doing this: about 30 seconds improvement at best... This is strange, but it seems to empirically prove that there is one document whose compilation is so long it trumps everything else...

I believe it would be worth documenting for both projects that whatever script is executed within the Docker container, its working directory is the same directory where the rest of the job's actions (outside the container) are also executed. It was not clear to me initially that if I do an actions/checkout to download some fonts, it would be downloaded in a way that, inside the container, it will be possible to copy them to the right place...

zydou commented 1 year ago

hi, @whisperity

I believe that supporting parallel builds is not advisable. Consider the typical compilation process for a tex file: "pdflatex -> bibtex -> pdflatex -> pdflatex". The sequences of these steps is crucial, and anything related to parallel could lead to unexpected errors. As xu-cheng pointed out, latex and the entire tex system are not designed to be built in parallel.

make invocations for post-processing are simply not available in Alpine Linux

You could try this fork, which use Debian instead of Alpine as base image. You can install the missing packages using apt-get

It was not clear to me initially that if I do an actions/checkout to download some fonts, it would be downloaded in a way that, inside the container, it will be possible to copy them to the right place...

This action has just been updated to v3 today. In v3, all useful paths and environment variables are now accessible within the container. Therefore, any files you have downloaded to your working directory will maintain their original structure inside the container.

whisperity commented 1 year ago

@zydou

I believe that supporting parallel builds is not advisable. Consider the typical compilation process for a tex file: "pdflatex -> bibtex -> pdflatex -> pdflatex". The sequences of these steps is crucial, and anything related to parallel could lead to unexpected errors.

I apologise if I was not clear enough when I initially proposed the idea. Having re-read the original post, I realise that my suggestion could be misunderstood if someone is not living day and night with compilers.

My suggestion was parallelism between individual root_file entries of the workflow. I know that within invocations of tools for the compilations of the same top-level .tex file, parallelism is not feasible and also dangerous.

[...] 5 top-level .tex files. [...] However, just specifying the files to the action, the build runs on each root_file sequentially [...]

I did not wish for @xu-cheng to parallelise pdflatex A.tex with (its own subsequent) bibtex A.tex, but rather parallelise pdflatex A.tex with pdflatex B.tex, given an environment where multiple cores are available (-j 2 works well for GHA) and if the user requests it.

For my dissertation right now, I have make -j spawn several latexmk A, latexmk B, etc. in parallel, which processes then manage running the smaller compilation steps for each A, B, ... sequentially under themselves.

latexmk calls the following sequence in my case: xelatex -> makeglossaries + makeindex + makeindex -> biber -> xelatex -> makeglossaries + makeindex + makeindex -> biber -> makeglossaries + makeindex + makeindex -> xelatex -> biber -> makeglossaries + makeindex + makeindex -> xelatex -> xelatex -> xdvipdfmx


You could try this fork, which use Debian instead of Alpine as base image. You can install the missing packages using apt-get

I've circumvented the problem by running the post-process actions (which are separate targets of my Makefile anyway, running qpdf and bookletimposer) outside the Docker container.


[...] all useful paths and environment variables are now accessible within the container. Therefore, any files you have downloaded to your working directory will maintain their original structure inside the container.

This is still not documented in a user-friendly way. Neither for latex-action, nor for texlive-action. Maybe I am not used to how Docker stuff is run in the context of GHA, but when I started rewriting my workflow at first I thought using a custom font will bite me and prevent the move to texlive-action.