Closed brainchild0 closed 1 year ago
I wouldn't say strangely since that is the standard notation for left and right quotation marks in latex. Whether in this particular case the quotes should be removed is a different matter.
We use \texorpdfstring
to ensure that regular tex commands don't go into the PDF bookmarks.
It seems that the usual quotation ligatures also don't work in this context.
You may find that if you use -t latex-smart --pdf-engine=xelatex
, it works properly. In this case pandoc won't use ligatures (because -smart
) and the unicode quotes should be passed through unchanged.
I don't know if a change to the defaults is called for, because without xelatex
using unicode quotes may not work.
Why would use of Unicode be dependent on a particular LaTeX engine? Are other engines unable to support characters outside of ASCII? Do non-English languages lack support in all engines but one?
Assuming the Unicode characters are not presenting a particular issue, would it not be more likely to produce desired results if normal translation by the smart extension, in contrast to the special LaTeX behavior, were applied to the metadata fields so as to generate the correct plain text string without LaTeX ligatures?
In other words, from the manual:
In LaTeX,
smart
means to use the standard TeX ligatures for quotation marks
It simply seems that metadata might be a special case for this rule.
Would this create any problems other than the possibility that the engine cannot properly handle a Unicode string? And in any case, could basic ASCII quotation marks be used?
Are other engines unable to support characters outside of ASCII?
Correct. pdflatex doesn't support non-ASCII well. xelatex and lualatex do.
Did you try the fix I suggested?
Yes, with smart
disabled, the document appearance seems the same, and the metadata looks correct. Both pdflatex and xelatex seem to work equally well.
But I am unsure of the penalties of disabling smart. It seems like the correct choice given that I write MarkDown using these conventions.
But more to the point of the issue, would it not be an improvement if handling occurred correctly even with the extension enabled, even if in some cases it would mean using only basic ASCII quotation marks?
No penalties disabling smart
on latex output if you're just producing pdf with xelatex or lualatex.
We can leave this open with the suggestion of using ASCII quotation marks, but I'm not sure it's worth the additional code complexity.
Then maybe smart should be disabled for LaTeX, if it has no benefit and some liability.
By the way, is there an error case for using the Unicode string in pdflatex? It worked fine for me just now.
Disabling the smart
option for LaTeX may be not a good option. For straight quotes in headings, it would be great to wrap them with \texorpdfstring
.
For example, converting:
\section{Pandoc's Features}\label{pandocs-features.md__pandocs-features}
to:
\section{\texorpdfstring{Pandoc's Features}{Pandoc’s Features}}\label{pandocs-features.md__pandocs-features}
I think the original issue has long ago been solved. Here's the result with current pandoc:
% pdfinfo x.pdf
Title: “One” – “Two” — “Three”
Thus, closing...
@jgm Wait, quotes in the headings are not processed correctly. If writing the heading in Markdown:
# "One"
Then converting to PDF via LaTeX, the PDF bookmark is still ``One''
instead of the desired “One”
.
@TomBener I'm not seeing this. You may be using an old version of pandoc? (Or older tex packages?)
@jgm You're correct. But I found a weird result. Let me clarify.
The content of the markdown file named test.md
are as follows:
# "One" Heading
Some texts here.
# Pandoc's Features
Then if I run the command:
pandoc --pdf-engine=xelatex test.md -o test.pdf
The generated PDF test.pdf
had the correct bookmark.
However, if I cut them to two steps, e.g. firstly generate LaTeX via Pandoc:
pandoc -s test.md -o test.tex
Then compile test.tex
to PDF manually:
xelatex test.tex
Then the generated PDF bookmark was not desired.
The Pandoc version:
$ pandoc --version
pandoc 3.1.6.1
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /Users/username/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
For some reason, I need to generate LaTeX and then compile it to PDF, so the difference is important for the workflow. Could you help me with the issue? Thanks a lot.
I think this is because in generating PDF via latex, we disable the smart
extension in writing the LaTeX. You could try with -t latex-smart
.
Disabling the smart
extension could be an option. However, when writing Chinese, the side effects emerged. Like the screenshot shows below, the English quotes were also treated as Chinese, which looked quite wide.
To generate the PDF above, the command below was executed:
pandoc --pdf-engine=xelatex -V CJKmainfont=NotoSerifCJKsc-Regular test.md -o test.pdf
Even if I loaded \usepackage[punct=plain]{ctex}
, the issue remained.
All problems lie in that Chinese and English use the same quotes in the Unicode table. In the Chinese LaTeX forum, it is recommended to write quotes as follows:
``English Quotes''
“中文引号”
Indeed, this is an annoying problem. I don’t expect pandoc can make changes for it, but just propose the issue.
Consider the following command:
The result is a simple document with a title formatted with curved quote marks, and an en- and em-dash:
However, the effect in the PDF metadata is less pleasant:
The dashes were translated nicely, but the quotation marks are handled strangely. What are the possibilities for creating a plain string that resembles the printed title as cleanly as possible?