gpoore / minted

minted is a LaTeX package that provides syntax highlighting using the Pygments library. Highlighted source code can be customized using fancyvrb.
1.75k stars 126 forks source link

issue with unicode characters #200

Closed emeth69 closed 2 years ago

emeth69 commented 6 years ago

Dear TeXperts, in my journey to pass from listings to minted I have an issue with several unicode (or simply non ascii) characters that in spite of what I declare at the beginning the are not typeset. Look at this minimal example (this is what I get: minted+unicode.pdf):

\documentclass[a4paper]{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[osf,sc]{mathpazo}
\usepackage[scaled=.75]{beramono}
\usepackage[greek,italian,russian,english]{babel}
\usepackage{minted}

\begin{document}

\begin{minted}[mathescape,
               linenos,
               numbersep=5pt,
               frame=lines,
               framesep=2mm]{csharp}
string title = "This is a Unicode π in the sky"
/*
Defined as $\pi=\lim_{n\to\infty}\frac{P_n}{d}$ where $P$ is the perimeter
of an $n$-sided regular polygon circumscribing a
circle of diameter $d$.
*/
const double π = 3.1415926535
\end{minted}

\begin{minted}[frame=lines,framesep=2mm]{pycon}
>>> a_list = ['α', 2, 3, 4, 'Ω']
>>> s = 'The Russian for «Hello World» is «Привет мир»'
>>> s
'The Russian for «Hello World» is «Привет мир»'
>>> a_list
['α', 2, 3, 4, 'Ω']
>>> π = 3.1415926535
>>> π
3.1415926535
\end{minted}

\begin{minted}[frame=lines,framesep=2mm]{c}
char c[] = "こんにちは 世界";

main(void) {
  printf("length of c :- %d but it has only 7 characters :- ", sizeof(c));
  printf("%s\n", c);
}
\end{minted}

\end{document}

The first listing is from the readme in the home of this website and it should work but instead the π symbols (those not from \pi) are missing. The second and third are some of mine piece of code that uses Russian, Greek and Japanese characters and the french guillemot («»).

Somewhere in the documentation I've read that minted doesn't work with unicode chars in strings but your example have one so I assumed you solved this is issue and forgot this note in the doc.

I've tried the utf8, utf8x encondings without luck. Even passing LGR to the call to fontenc doesn't help. The errors I get are:

! LaTeX Error: Command \textpi unavailable in encoding T1.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                                                                            
 l.2 ...{}This is a Unicode π in the sky\PYGZdq{}}

for the Greek (there are several, one for each Greek letter).

! LaTeX Error: Command \CYRP unavailable in encoding T1.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                                                                               
l.3 ...orld» is «Привет мир»\PYGZsq{}}

For the Russian (one per Russian char, with different command missing).

! Package inputenc Error: Unicode char こ (U+3053)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
l.2 ...}こんにちは 世界\PYGZdq{}}\PYG{p}{;}

for the Japanese.

C, python, C# are just few examples I need to have them working with basically any lexer. Similarly I can't show APL programs.

What am I doing wrong?

mbaz commented 6 years ago

Just a data point, not a solution, but: using xelatex, I have no problem with unicode characters and minted.

gpoore commented 6 years ago

@emeth69 I would suggest trying to get all these characters working in a normal text document before trying to use them inside minted environments. My guess is that most, if not all, of your issues are related to fonts and font encodings. I don't have much experience with that...the fontenc documentation might be a place to start. minted should be able to handle any character you can get working in normal text.

emeth69 commented 6 years ago

Ok, I followed your advice but even if I can't deal with these chars in the normal text I can't in the minted environment. Greek text is excellently dealt by textgreek package that smoothly replaces any Greek character without any extra command.

All the other languages can be dealt (or at least I can deal with them) only through the babel package and the command \foreignlanguage. This means to pass from minted to pure LaTeX through an escapeinside option. But this doesn't always work, e.g. it doesn't work inside a string and also when it works the final result presents some issues. Look at the MWE:

 \documentclass[a4paper]{article}

\usepackage[russian,english]{babel}
\usepackage[utf8]{inputenc}
\usepackage[T2C, T1]{fontenc}
\usepackage[osf,sc]{mathpazo}
\usepackage[scaled=.75]{beramono}
\usepackage{minted}

\begin{document}
\foreignlanguage{russian}{Привет мир}

\begin{minted}[frame=lines,escapeinside=||, framesep=2mm]{pycon}

>>> |\foreignlanguage{russian}{Привет}| = 'The Russian for «Hello World» is «|\foreignlanguage{russian}{Привет мир}|»'
>>> |\foreignlanguage{russian}{Привет}| 
'The Russian for «Hello World» is «|\foreignlanguage{russian}{Привет мир}|»'
\end{minted}

\end{document}

As you can see by the generated pdf (unicode-problems.pdf) the Russian text works properly outside minted, it prints it when used as a variable and it doesn't escape inside the string definition. In any case the output is incomplete: the >>> and = have been removed, in the third row the full non russian text is removed and so on.

Last but not least I can have the escape working only using | symbol that is a character quite used in several languages (as ocaml, erlang, etc) and to use them create some unexpected and undesired formatting.

gpoore commented 6 years ago

@emeth69 You would need to configure fonts so that all characters are accessible without any extra commands (without \foreignlanguage, etc.), just as in textgreek. I don't have enough font experience to offer any suggestions for that.

The simplest solution is probably just to switch to xelatex or luatex. This works for xelatex (I also added python3=true to fix pycon highlighting):

\documentclass[a4paper]{article}

\usepackage{fontspec}
\setmainfont{DejaVu Sans}
\setmonofont{DejaVu Sans Mono}
\usepackage{xeCJK}
\usepackage{minted}

\begin{document}

\begin{minted}[mathescape,
               linenos,
               numbersep=5pt,
               frame=lines,
               framesep=2mm]{csharp}
string title = "This is a Unicode π in the sky"
/*
Defined as $\pi=\lim_{n\to\infty}\frac{P_n}{d}$ where $P$ is the perimeter
of an $n$-sided regular polygon circumscribing a
circle of diameter $d$.
*/
const double π = 3.1415926535
\end{minted}

\begin{minted}[frame=lines,framesep=2mm,python3=true]{pycon}
>>> a_list = ['α', 2, 3, 4, 'Ω']
>>> s = 'The Russian for «Hello World» is «Привет мир»'
>>> s
'The Russian for «Hello World» is «Привет мир»'
>>> a_list
['α', 2, 3, 4, 'Ω']
>>> π = 3.1415926535
>>> π
3.1415926535
\end{minted}

\begin{minted}[frame=lines,framesep=2mm]{c}
char c[] = "こんにちは 世界";

main(void) {
  printf("length of c :- %d but it has only 7 characters :- ", sizeof(c));
  printf("%s\n", c);
}
\end{minted}

\end{document}