pappasam / latexbuild

Build output from Latex using Python and Jinja2 templating
MIT License
47 stars 9 forks source link

UnicodeDecodeError on funky pdflatex output #14

Open thegcat opened 2 months ago

thegcat commented 2 months ago

I stumbled upon a problem with latexbuild where it trying to (UTF-8-)decode the output of pdflatex leads to a UnicodeDecodeError.

Minimal example to reproduce:

\documentclass{scrlttr2}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[sfdefault,scaled=.85]{FiraSans}

\begin{document}
Hiermit wird bestätigt, dass \textbf{Foooo Baaaar Baaaaaaaaaaz} an der 52,0. Konferenz der
Informatikfachschaften, die vom 8.5. - 12.5.2024 an der Rheinland-Pfälzischen
Technischen Universität Kaiserslautern-Landau in Kaiserslautern stattgefunden
hat.
\end{document}
from latexbuild import build_pdf
build_pdf('.', 'test.tex', 'foo.pdf')

This leads to:

Overfull \hbox (35.30524pt too wide) in paragraph at lines 7--11
\T1/FiraSans-OsF/regular/n/12 fach-schaften, die vom 8.5. - 12.5.2024 an der Rh
Failed during latex build
Traceback (most recent call last):
  File "/Users/thegcat/Code/kif/tebege/.direnv/python-3.11/lib/python3.11/site-packages/latexbuild/build.py", line 98, in run_latex
    stdout = check_output_cwd(cmd, path_template_dir)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thegcat/Code/kif/tebege/.direnv/python-3.11/lib/python3.11/site-packages/latexbuild/subprocess_extension.py", line 32, in check_output_cwd
    line_str = line.decode().strip()
               ^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 10: invalid continuation byte
thegcat commented 2 months ago

One solution is to ignore errors in the decode step, see https://toot.kif.rocks/@marlena/112485747608767645

We however sidestepped the issue by using the hyphenat package in the tex file so that "Kaiserslautern-Landau" could break and not lead to the overfull hbox error and the weird encoding stuff in the first place.