pytask-dev / pytask-latex

Compile LaTeX documents with pytask.
MIT License
7 stars 1 forks source link

ENH: adding manual compilation chain without latexmk #28

Closed axtimhaus closed 4 months ago

axtimhaus commented 2 years ago

Is your feature request related to a problem?

As I mentioned in pytask-dev/latex-dependency-scanner#12 I have problems with the compile chain if I want to use bib2gls additionally. latexmk does not support bib2gls and so I have to do it manually. bib2gls needs the main .aux file to run, so latex must be compiled one time, than bib2gls must be run and afterwards the two final latex compilations.

Describe the solution you'd like

The current system is really rigid, just one entry point for latex. I would suggest a more flexible way in addition to that to create distinct tasks to latex compilation, biber running and bib2gls running as well as other indexing engines. So one could define different deps for the first and the final latex runs.

I would like to discuss how this API could look like. My first suggestion is something like:

@pytask.mark.latex_first
@pytask.mark.depends_on("doc.tex")
@pytask.mark.produces("doc.bcf")  # may be inferred
@pytask.mark.produces("doc.aux")  # may be inferred
@pytask.mark.produces("doc.pdf")
def task_latex_first_run():
    pass  # dummy

@pytask.mark.latex_biber
@pytask.mark.depends_on("doc.tex")
@pytask.mark.depends_on("doc.bcf")  # may be inferred
@pytask.mark.produces("doc.bbl")  # may be inferred
# *.bib inferred by dep scanner
def task_latex_biber():
    pass  # dummy

@pytask.mark.latex_bib2gls
@pytask.mark.depends_on("doc.tex")
@pytask.mark.depends_on("doc.aux")  # may be inferred
# *.bib and *.glstex inferred by dep scanner
def task_latex_biber():
    pass  # dummy

@pytask.mark.latex_final
@pytask.mark.depends_on("doc.bbl")  # may be inferred
@pytask.mark.depends_on("doc.tex")
@pytask.mark.produces("doc.pdf")
# *.glstex inferred by dep scanner
def task_latex_final_run():
    pass  # dummy (compile two times)
tobiasraabe commented 2 years ago

Hi @axtimhaus,

thanks for raising this issue. When pytask-latex was created, I wanted a quick solution and was happy to find out that latexmk covers like 90% of all use-cases. In this case the package hits a wall.

I like about your solution that it separates the complete build process in composable chunks or build steps which can be reused and rearranged depending on the user's needs. I have two comments:

  1. Your current API is very verbose and I assume we can do a lot of stuff under the hood without the user even noticing. You already indicated that some files can be inferred and thus eliminated from the API. If we are lucky, then, only the build steps are left.
  2. A more severe limitation of pytask is that two tasks cannot have the same product. Otherwise, it cannot produce a valid DAG to run tasks in an appropriate order. Since compiling a document usually means that at least the PDF is changed inplace a couple of times, I guess we need to pack everything into one task.

Continuing the idea of the first bullet point and boiling down your API, I could imagine some API like this:

@pytask.mark.latex(build_steps=["latex", "biber", "bib2gls", "latex"])
@pytask.mark.depends_on("doc.tex")
@pytask.mark.produces("doc.pdf")
def task_compile_doc():
    pass

Each string can be mapped to an internal build step or a user provides a custom one adhering to some specs. Some build steps might also need args or kwargs which we also need to figure out.

But, maybe as a starting point, would you like to create a task without the pytask.mark.latex marker which compiles your document? Because pytask-latex is just a slightly more intelligent wrapper around subprocess.run. For example, you could

I think if we see the task, we will have better picture of the requirements for the API and you can decide whether it is worth it to work on the implementation.

I do not have time to work on the implementation soon, but I could provide some sparring if you would like to take a stab at it.

axtimhaus commented 2 years ago

The problem with the DAG I encountered so far. But I would suggest to define a distinct task for each build step, a least as most detailed option, beacuse in general, a user may want to execute not just biber and bib2gls, but also some custom tasks. Most external Latex programs have in common, that they need a Latex run before to create .aux files and similar.

My current workaround looks like that:

ROOT_DIR = Path(__file__).parent.resolve()
ROOT_NAME = "documentation"
ROOT_TEX = ROOT_DIR / f"{ROOT_NAME}.tex"
ROOT_PDF = ROOT_DIR / f"{ROOT_NAME}.pdf"

_root_deps = latex_dependency_scanner.scanner.scan(ROOT_TEX)

ROOT_DEPS = [d for d in _root_deps if d.suffix != ".bib"]
ROOT_BIBS = [d for d in _root_deps if d.suffix == ".bib"]

def collect_symbols_files():
    regex = re.compile(r"\\glsxtrresourcefile{([^{}]*)}", re.M)
    tex_text = ROOT_TEX.read_text()
    matches = regex.findall(tex_text)
    return [ROOT_DIR / f"{m}.bib" for m in matches], [ROOT_DIR / f"{m}.glstex" for m in matches]

SYMBOLS_BIBS, SYMBOLS_GLSTEXS = collect_symbols_files()

@pytask.mark.depends_on(ROOT_DEPS)
@pytask.mark.depends_on(ROOT_TEX)
@pytask.mark.produces(f"{ROOT_NAME}.bcf")
@pytask.mark.produces(f"{ROOT_NAME}.aux")
def task_compile_latex_first():
    subprocess.run(("pdflatex", *_LATEX_OPTS, ROOT_TEX))

@pytask.mark.depends_on(ROOT_TEX)
@pytask.mark.depends_on(f"{ROOT_NAME}.bcf")
@pytask.mark.produces(f"{ROOT_NAME}.bbl")
def task_run_biber():
    subprocess.run(("biber", ROOT_NAME))

@pytask.mark.depends_on(ROOT_TEX)
@pytask.mark.depends_on(f"{ROOT_NAME}.aux")
@pytask.mark.depends_on(SYMBOLS_BIBS)
@pytask.mark.produces(SYMBOLS_GLSTEXS)
def task_run_bib2gls():
    subprocess.run(("bib2gls", ROOT_NAME))

@pytask.mark.depends_on(ROOT_DEPS)
@pytask.mark.depends_on(SYMBOLS_GLSTEXS)
@pytask.mark.depends_on(f"{ROOT_NAME}.bbl")
@pytask.mark.depends_on(ROOT_TEX)
@pytask.mark.produces(ROOT_PDF)
def task_compile_latex_final():
    subprocess.run(("pdflatex", *_LATEX_OPTS, ROOT_TEX))

I see, that this approach is kind of a lye, because the Latex runs produce much more than is denoted in the tasks.

In general I have no problem with coding this stuff myself.

tobiasraabe commented 2 years ago

Thanks for providing your workaround! I am less worried about that some products are left out - latex just works with inplace changes - and more that it is hard for users to specify dependencies and products correctly, for example the PDF, and then reporting issues.

I completely agree with you that the build steps should be composable. This is what I had in mind and I think it has the flexibility you want. We assume that we have predefined internal build steps for latex and biber. First, you would set up a function which compiles your sources with bib2gls.

def compile_bib2gls(path_to_tex, path_to_pdf, **kwargs):
    return subprocess.run(("bib2gls", "--dir", path_to_pdf.parent.as_posix(), path_to_pdf.with_suffix("").name), check=True)

The arguments of the function follow the specification of the API for build steps. A build step can receive arguments by specifying their name and collect unnecessary but provided arguments with kwargs.

Then, your build step is integrated alongside the predefined internal ones.

@pytask.mark.latex(build_steps=["latex", "biber", compile_bib2gls, "latex"])
@pytask.mark.depends_on("doc.tex")
@pytask.mark.produces("doc.pdf")
def task_compile_doc():
    pass

What do you think about this approach? Do you see any limitations and do you think the API can be easily understood?

A little bit of thinking needs to go into the signature of the @pytask.mark.latex decorator since it needs to be backwards-compatible, but it should be managable.

Internally, there is a function for compiling the document which loops over all build steps, provides arguments to the functions and executes them. This is super preliminary.

def compile_latex_document(build_steps, ...):
    for step in build_steps:
        if isintance(step, str)
            func = _PREDEFINED_BUILD_STEPS[step]
        else:
            func = step

        status = func(**kwargs)
        check_status(status)
axtimhaus commented 2 years ago

Sounds good. This approach has the benefit in comparison to mine, that with my workaround I noticed, that the tasks are not skipped, since the .aux and .bcf files are everytime changed by the final latex run. With your approach these files are not included in the dependencies and therefore the task is skipped if no changes were made. Backwards compatibility should be easy, just fire the standard latex, biber, latex, latex chain if no args are provided.

Regarding the str items: I should be possible to pass additional commandline args to the predefined steps. I can imagine a set of functions constructing the step functions like so:

import pytask.latex.build_steps as bs

@pytask.mark.latex(build_steps=[
    bs.latex(("", ...)),
    bs.biber(("", ...))
    bs.latex(("", ...))
    bs.latex(("", ...))
])
@pytask.mark.depends_on("doc.tex")
@pytask.mark.produces("doc.pdf")
def task_compile_doc():
    pass

So the internal compile function just needs to be:

def compile_latex_document(build_steps, ...):
    for step in build_steps:
        status = build_step(**kwargs)
        check_status(status)

Which kwargs do you think about additionally to the .tex and .pdf paths?

tobiasraabe commented 2 years ago

I like the constructor functions for the build steps! They are as short as strings and have args and kwargs and dont require something like functools.partial to pass arguments to the actual build step function. Initially, I thought of the latter.

For backward compatibility we should provide the fallback to latexmk if it exists, then latex, biber, latex, latex.

Nice, feels like we have an idea. Should we start with a plan for the implementation or do you see other obstacles?

tobiasraabe commented 2 years ago

Just a proposal. We could split the implementation into two PRs.

  1. Switch to the new interface with build steps, create a build step for latexmk, set it as the default and handle command line args as before. Create a deprecation warning when command line args are provided the old way. Removal will be next minor version change.

  2. Implement latex, biber and bib2gls build steps.

axtimhaus commented 2 years ago

We have I plan I think.

tobiasraabe commented 4 months ago

Hi @axtimhaus, I am going to close this issue now because the main goal, customizable build steps, has been achieved. Your contribution would be highly welcomed if you would like to add more compile chains or compile steps. Thanks a lot for all your effort and time! 🙇‍♂️❤️