gpoore / minted

minted is a LaTeX package that provides syntax highlighting using the Pygments library. Highlighted source code can be customized using fancyvrb.
1.74k stars 126 forks source link

\inputminted with non-latin characters gives "invalid utf-8 sequence" message, .minted file has wrong encoding #411

Open DmNosachev opened 1 week ago

DmNosachev commented 1 week ago

Problem Using inline code block with non-latin script (e.g. Russian) in minted works works the same as in the 2.x version, but not with \inputminted. It gives the following error message: ! String contains an invalid utf-8 sequence with the number on the corresponding line of the .minted file.

MWE (LuaLaTeX):

\documentclass{article}

\usepackage{fontspec}

\defaultfontfeatures{Scale=MatchLowercase}
\setmainfont{erewhon}
\setromanfont{erewhon}
\setmonofont{Liberation Mono}

\usepackage{polyglossia}
\setmainlanguage[babelshorthands=true]{russian}
\setotherlanguage{english}

\usepackage{minted}
\setminted{
      encoding = UTF-8
    }
\usepackage[svgnames]{xcolor}

\begin{document}

\begin{minted}{c}
int main() {
// Cyrillic comment: комментарий
printf("hello, world");
return 0;
}
\end{minted}
\end{document}

Compile command (Windows 10, TeX Live 2024): lualatex --interaction=nonstopmode --halt-on-error -shell-escape minted3test.tex. This example compiles without problems, but if I replace minted environment with \inputminted (from the file which contains the same code block: \inputminted{c}{test.c}) compilation fails with the following message:

(./_minted/B5B9357A8A6705C7C2A57CB4C61FB9F8.highlight.minted
! String contains an invalid utf-8 sequence.
l.3 \PYG{c+c1}{// Cyrillic comment:
                                  }}}^^@}

Workaround I have noticed that .minted file was created with Windows-1251 encoding (8-bit encoding for some Cyrillic languages). Converting it to utf-8 temporarily solves the problem.

gpoore commented 1 week ago

Everything is supposed to be encoded as UTF-8, but it looks like that got lost when I split off the latexrestricted Python package. I've fixed this in the dev version on GitHub, and will be releasing new versions of everything within the next few hours.