pandoc / lua-filters

A collection of lua filters for pandoc
MIT License
603 stars 165 forks source link

other-language: #36

Closed greut closed 5 years ago

greut commented 5 years ago

A filter I've been using to avoid french hyphenation in code blocks once transformed into a TeX document.

source

```yaml
title: Mon titre
lang: fr

without

screenshot from 2018-12-30 15-05-10

with

screenshot from 2018-12-30 15-03-59

Extracted from this project: https://github.com/HE-Arc/rapport-technique

jgm commented 5 years ago

It's surprising to me that polyglossia does these transformations inside verbatim contexts. (I assume it's polyglossia that is doing it?)

Is it because pandoc's highlighted code blocks use a custom environment that polyglossia doesn't recognize? If you just have a code block without syntax marked, does it do the same thing?

Just wondering whether this should be handled with a modification to pandoc rather than a filter.

greut commented 5 years ago

It seems to do the same thing.

    ---
    title: Mon titre
    lang: fr
    ---

```
---
title: Mon titre
lang: fr
---
```

```yaml
---
title: Mon titre
lang: fr
---
```

screenshot from 2019-01-01 11-51-59

jgm commented 5 years ago

Interestingly, this only happens with --pdf-engine=xelatex. (I guess because polyglossia is only used with xelatex.)

jgm commented 5 years ago

From polyglossia manual:

In some very specific contexts (such as music score creation), TEX hyphenation is something to avoid as it may cause troubles. polyglossia provides two functions: \disablehyphenation and \enablehyphenation. Note that when you se- lect a new language, hyphenation will be in the same state (enabled or disabled) as before. When you reenable it, it will take the last selected language.

So perhaps we could conditionally include these around all code blocks, or make them part of the special environment definition?

jgm commented 5 years ago

I don't know if it's worth adding all the additional complexity to the default template that this would require to redefine the verbatim environments. But for this filter, it would probably be cleaner to add \disablehyphenation before each code block and \enablehyphenation after it, instead of using otherlanguage. (And then it would make sense to call the filter something like disable-hyphenation-in-code.)

Of course, because these commands are from polyglossia, you'd probably want to do something like

\ifx\disablehyphenation\undefined\else\disablehyphenation\fi
greut commented 5 years ago

that would have been so neat and easy, but I didn't manage to get it working (using xelatex)

screenshot from 2019-01-01 22-01-53

\documentclass[]{article}

\usepackage{polyglossia}
\setmainlanguage[]{french}
\setotherlanguage[]{english}

\begin{document}

lang: fr

\disablehyphenation
lang: fr-nohyphen
\enablehyphenation

\begin{otherlanguage}{english}
lang: en
\end{otherlanguage}

\end{document}
jgm commented 5 years ago

I guess that makes sense: it's not really a matter of hyphenation, after all. I looked through the polyglossia manual, and it doesn't say anything about colons. So, maybe your approach is the only one that will work!

jgm commented 5 years ago

But I do think the name of this filter is confusing. Wouldn't a better name somehow mention code or verbatim blocks, so people will know what this pertains to?

jgm commented 5 years ago

PS. Instead of adding a Para before and after containing raw latex, you might try putting each code block inside a Div with attribute lang="en". This works just as well and it's cleaner and simpler.

greut commented 5 years ago

I need a little help with the lua side though... sorry.

When putting the <div lang="en"> in the markdown, it produces this.

Div ("",[],[("lang","en")]) [ ... ]

Although, in the following Lua pandoc.Div(el, pandoc.Attr("lang", {"en"})) produces.

Div ("lang",["en"],[]) [ ... ]

Thanks!

jgm commented 5 years ago

I think what you want is

pandoc.Attr("", {}, {lang = "en"})

Yoan Blanc notifications@github.com writes:

I need a little help with the lua side though... sorry.

When putting the <div lang="en"> in the markdown, it produces this.

Div ("",[],[("lang","en")]) [ ... ]

Although, in the following Lua pandoc.Div(el, pandoc.Attr("lang", {"en"})) produces.

Div ("lang",["en"],[]) [ ... ]

Thanks!

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pandoc/lua-filters/pull/36#issuecomment-450828482

greut commented 5 years ago

Huzzah!

screenshot from 2019-01-02 20-12-11

I'll reopen a new PR soon. Thanks a lot!

adunning commented 5 years ago

This does appear to be a bug specific to Polyglossia and XeTeX. It works fine with Babel:

\documentclass{article}

\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\newenvironment{Shaded}{}{}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.49,0.56,0.16}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{#1}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{#1}}

\usepackage[shorthands=off,main=french]{babel}

\begin{document}

\begin{Shaded}
\begin{Highlighting}[]
\OtherTok{---}
\FunctionTok{title:}\AttributeTok{ Mon titre}
\FunctionTok{lang:}\AttributeTok{ fr}
\OtherTok{---}
\end{Highlighting}
\end{Shaded}

\end{document}
adunning commented 5 years ago

There's a workaround for Polyglossia posted in https://github.com/reutenauer/polyglossia/issues/27.

jgm commented 5 years ago

Good find. We could simply add this to the default pandoc template where polyglossia is loaded:

\makeatletter
\appto\verbatim@font{\nofrench@punctuation}
\makeatother

It's a bit verbose to have all of this in the template, since it's irrelevant unless lang is fr. I suppose we could have the writer check the lang in metadata, and set a special variable if lang = fr; these lines could be included if the variable is set.

If this seems a good idea, perhaps an issue should be opened for pandoc. (Though if we decide to make polyglossia optional, not default, for xelatex, then it may be a nonissue.)

Andrew Dunning notifications@github.com writes:

There's a workaround for Polyglossia posted in https://github.com/reutenauer/polyglossia/issues/27.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pandoc/lua-filters/pull/36#issuecomment-450963746

jgm commented 5 years ago

@greut - given the workaround, is this filter even necessary any more?

greut commented 5 years ago

@jgm I cannot get the work around to work. Although, it became clear that polyglossia+xelatex are the bad guys here, I'll keep this filter and consider it a hack rather than something worth spreading. Thanks everyone for you help!

adunning commented 5 years ago

@greut Are you receiving an error, or is it simply not displaying properly? This works for me under either XeTeX or LuaTeX:

\documentclass{article}

% Polyglossia with workaround
\usepackage{polyglossia}
\setdefaultlanguage{french}
\setotherlanguage{english}

\makeatletter
\appto\verbatim@font{\nofrench@punctuation}
\makeatother
% 

% or use Babel
% \usepackage[english, main=french]{babel}

\begin{document}

\verb|00:11:43:D4:86:A0|

\selectlanguage{english}
\verb|00:11:43:D4:86:A0|

\selectlanguage{french}
\begin{verbatim}
00:11:43:D4:86:A0
\end{verbatim}

\end{document}

Note that you might have to delete your .aux file between runs when switching between Babel and Polyglossia. If you're still having trouble, perhaps try checking that you're running the latest version of TeX Live by running tlmgr update --self --all --reinstall-forcibly-removed.

greut commented 5 years ago

pandoc produces \texttt{} and uses fancyvrb which are unaffected by the fix. No worries.