jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.2k stars 3.36k forks source link

Latex to epub – German Quotes not transformed correctly #5470

Open aschatt opened 5 years ago

aschatt commented 5 years ago

Pandoc transformes German quotes incorrectly.

(1) Context

(a) in the LaTeX file the Babel / German package is used

\usepackage[ngerman]{babel}

(b) The quotes are written correctly such as: "`Das ist ein Zitat"'

(c) In the pandoc command also language is defined as German with:

-V lang=de-AT

(2) Correct result would be:

The correct translation to ePub would have to be: „Das ist ein Zitat“ or alternatively possible also: »Das ist ein Zitat«

(2) Bug:

Pandoc actually renders English quotation marks: "Das ist ein Zitat"

mb21 commented 5 years ago

parsing latex

You're saying the exact input is the following?

`Das ist ein Zitat"

Shouldn't it be in LaTeX:

``Das ist ein Zitat''

Or is this a syntax I don't know?

outputting html

The second part of this issue is that we currently don't output quotes in different locales, see https://github.com/jgm/pandoc/issues/84

But for ePUB/HTML output you can use --html-q-tags and use some CSS to your liking.

jgm commented 5 years ago

Note that -V lang=de-AT is not generally what you want. Set a metadata field instead of a variable. Variables only affect template rendering, while metadata fields can affect parsing as well. I don't expect this will make a difference in this case, though.

aschatt commented 5 years ago

@mb21 No, that is actually correct. Your Syntax creates English quotation marks. The German ones are like this: "`here ist the quote"'

jgm commented 5 years ago

So am I correctly understanding that in babel, "` and "' are ligatures for and respectively? Is this just with babel or more widely? Is it just when the german option is used with babel?

aschatt commented 5 years ago

First part: exactly right. If this is just with babel, I don't know. This is the usually recommended version to set German text. I found this in several books and tutorials. I personally always used that method.

jgm commented 5 years ago

Experimented -- looks like it's babel-specific, and only when language is german.

aschatt commented 5 years ago

This is possible.

Anyways: unfortunately I currently see no way to easily create german quotation marks in pub. This is really bad for me because my whole publishing workflow otherwise works very well.

agusmba commented 5 years ago

Since they are not "smart" quotes (start and end are different in the source material), wouldn't a simple filter or a preprocessor take care of that conversion for you?

jgm commented 5 years ago

I just saw that in the LaTeX reader we have this code in smart quote parsing:

   -- the following is used by babel for localized quotes:
   <|> quoted' doubleQuoted (try $ sequence [symbol '"', symbol '`'])
                            (void $ try $ sequence [symbol '"', symbol '\''])

This causes the ligatures to be rendered as regular English-style quotes in most output formats, which isn't desirable. Instead of parsing "`hi"' as Quoted DoubleQuoted, we should simply parse these ligatures as unicode characters (the German quotes). We could also make this behavior sensitive to whether babel / german is being used, although it might be safe to assume that these ligatures won't occur otherwise.

Note that you could write a simple lua filter that renders Quoted DoubleQuoted elements with the German quotes. See lua filter documentation on the website.

jgm commented 5 years ago
-- quote.lua
function Quoted(el)
  if el.quotetype == 'DoubleQuote' then
    return {pandoc.Str("„"), pandoc.Span(el.content), pandoc.Str("“")}
  end
end

Run with pandoc --lua-filter quote.lua.

Delanii commented 4 years ago

Hit upon the same issue with docx and odt formats (and also with latex format, for which I am using custom template with always-enabled csquotes package). Last comment of @jgm solved that, with help of docs on lua filters to accomodate for different formats for me, and I am a lua programming newbie.

From that perspective, I guess that such simple filter is OK to use and easier to maintain that any change in pandoc itself. Also, there is already a filter in pandoc-lua-filters repository. On that basis, would it be appropriate to close this issue and also #84 ? (which seems pretty old to me, labelled also with "high complexity"; but I would guess that it is solved now with this filter)

jgm commented 4 years ago

I want to keep this issue open to track the issue noted above about the special babel ligatures. We could handle those better.

wanddynosios commented 1 year ago

Extending the answer above, you can also add single quote handling like this:

    if el.quotetype == 'SingleQuote' then
        return {pandoc.Str("‚"), pandoc.Span(el.content), pandoc.Str("‘")}
      end