jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

Smart - quote types #84

Open jgm opened 13 years ago

jgm commented 13 years ago

currently only english type quotes (both up) are supported, an option like -Sq x (x = numeric id of quote type) would be nice to allow e.g. (this is for Czech) „abc“; or even -SqAB, where A(B) represent characters for opening(closing) quote

Google Code Info: Issue #: 287 Author: svat...@mail2web.com Created On: 2011-02-14T10:26:54.000Z Closed On:

jgm commented 13 years ago

here are links to proper unicode codes for czech (alternatives to english double) ones: opening: http://www.fileformat.info/info/unicode/char/201e/index.htm closing: http://www.fileformat.info/info/unicode/char/201c/index.htm

other useful types: 1.) english/single o: http://www.fileformat.info/info/unicode/char/2018/index.htm c: http://www.fileformat.info/info/unicode/char/2019/index.htm 2.) czech/single o: http://www.fileformat.info/info/unicode/char/201a/index.htm c: http://www.fileformat.info/info/unicode/char/2018/index.htm

Google Code Info: Author: svat...@mail2web.com Created On: 2011-02-14T10:36:56.000Z

jonassmedegaard commented 12 years ago

I would very much like to see this too.

I imagine a minimal implementation could be to support explicit setting them as unicode character pairs like this:

pandoc -V doublequote=»« -V singlequote=›‹ -o output.html input.txt

More clever (e.g. a later addition) would be a language lookup table, to alter the defaults per language. That would cleverly switch to „this“ when setting -V lang=da (or autodiscovering the language?), and might also support different quoting style for text pieces of different languages within same document.

jonassmedegaard commented 12 years ago

For others reading this issue, here's what I currently use to postprocess html output to switch to danish citation style:

perl -i -pe 's/”(\w([^”]*\w)?)”/„$1“/g;s/’(\w([^’]*\w)?)’/‚$1‘/g' output.html

(to change to other citation styles, change the characters around the two

$1
)

jgm commented 12 years ago

To clarify: are you requesting support for configurable smart quotes on output, or also in input?

If you want to write Danish style quotes (or whatever) in input, they should pass through unchanged to the output.

So I gather you want to write "hello" in your markdown file and get „hello“ in the output. Correct?

jonassmedegaard commented 12 years ago

On 12-05-30 at 07:19pm, John MacFarlane wrote:

To clarify: are you requesting support for configurable smart quotes on output, or also in input?

If you want to write Danish style quotes (or whatever) in input, they should pass through unchanged to the output.

So I gather you want to write "hello" in your markdown file and get „hello“ in the output. Correct?

Yes, correct.

(I did wonder why you had notions of quoting style at all in input files in source - now I understand that (to some degree of "understand")) :-)

jgm commented 12 years ago

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get national quote styles.

In HTML 5, you just need to add some CSS (which you can include using --css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

jonassmedegaard commented 12 years ago

On 12-05-31 at 08:28pm, John MacFarlane wrote:

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get national quote styles.

In HTML 5, you just need to add some CSS (which you can include using --css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

I knew about LaTeX but not HTML5. Thanks for the hint!

Still, as I suspect is even written between the lines above: that is little help for my current project for primary schools that use IE7.

I can possibly use Modernizr.js and/or IE7.js but in my experience those often collide with other JavaScript messing with the DOM, e.g. Slidy and Slideous.

Also, for the reference, above are not danish quotes. These are correct:

q { quotes: "„" "“" "‚" "‘"; }

...and (as an active translator made me aware when I tried to "correct" him) these are equally correct (even if not my preference, as you might guess from that incident):

q { quotes: "»" "«" "›" "‹"; }

More info here: http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks

Happy to notice that Slideous has been merged into Pandoc now!

jgm commented 12 years ago

+++ Jonas Smedegaard [Jun 01 12 02:28 ]:

On 12-05-31 at 08:28pm, John MacFarlane wrote:

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get national quote styles.

In HTML 5, you just need to add some CSS (which you can include using --css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

I knew about LaTeX but not HTML5. Thanks for the hint!

Still, as I suspect is even written between the lines above: that is little help for my current project for primary schools that use IE7.

I can possibly use Modernizr.js and/or IE7.js but in my experience those often collide with other JavaScript messing with the DOM, e.g. Slidy and Slideous.

Also, for the reference, above are not danish quotes. These are correct:

q { quotes: "„" "“" "‚" "‘"; }

Yeah, I know. I just gave the English ones and added "but for your language," because it's tough for me to type those.

...and (as an active translator made me aware when I tried to "correct" him) these are equally correct (even if not my preference, as you might guess from that incident):

q { quotes: "»" "«" "›" "‹"; }

By the way, LaTeX csquotes also has a danish=guillemots option.

jonassmedegaard commented 12 years ago

On 12-06-01 at 08:49am, John MacFarlane wrote:

+++ Jonas Smedegaard [Jun 01 12 02:28 ]:

On 12-05-31 at 08:28pm, John MacFarlane wrote:

Jonas: In HTML 5 and LaTeX/PDF output, it is already possible to get national quote styles.

In HTML 5, you just need to add some CSS (which you can include using --css): something like this, but for your language:

 q { quotes: "“" "”" "‘" "’"; }

In LaTeX, add \usepackage[danish=quotes]{csquotes} to your template.

I knew about LaTeX but not HTML5. Thanks for the hint!

Still, as I suspect is even written between the lines above: that is little help for my current project for primary schools that use IE7.

I can possibly use Modernizr.js and/or IE7.js but in my experience those often collide with other JavaScript messing with the DOM, e.g. Slidy and Slideous.

Also, for the reference, above are not danish quotes. These are correct:

q { quotes: "„" "“" "‚" "‘"; }

Yeah, I know. I just gave the English ones and added "but for your language," because it's tough for me to type those.

Ahh, how lovely: you beat me in nitpicking: I missed that tiny "but"! :-D

WebDucer commented 12 years ago

Language dependent smart quotes would be very nice (HTML, EPUB writer) for me too. I use markdown as source with "-quotes for German, French and Russian texts.

larmarange commented 11 years ago

the best would be for pandoc to adapt according to the lang variable

larmarange commented 10 years ago

Also, it would be great if pandoc could manage some typographic corrections. For example, in French, you should have a   before signs like ! ? ; or : .

reagle commented 10 years ago

I currently do regexes on the resulting HTML to switch between American and English quotes, but a format independent way would be handy!

paulmenzel commented 8 years ago

Hi. This seems to be a little related to issue #327 too.

I’d love to see such switches too, where one could simply pass the desired quote characters to which the ASCII quotes " around a sequence of words should be converted to.

$ pandoc --version
pandoc 1.12.4.2
Compiled with texmath 0.6.6.1, highlighting-kate 0.5.8.5.
Syntax highlighting is supported for the following languages:
    actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog,
    clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d,
    diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
    fortran, fsharp, gcc, gnuassembler, go, haskell, haxe, html, ini, isocpp,
    java, javadoc, javascript, json, jsp, julia, latex, lex, literatecurry,
    literatehaskell, lua, makefile, mandoc, markdown, matlab, maxima, metafont,
    mips, modelines, modula2, modula3, monobasic, nasm, noweb, objectivec,
    objectivecpp, ocaml, octave, pascal, perl, php, pike, postscript, prolog,
    pure, python, r, relaxngcompact, restructuredtext, rhtml, roff, ruby, rust,
    scala, scheme, sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, texinfo,
    verilog, vhdl, xml, xorg, xslt, xul, yacc, yaml
Default user data directory: /home/paul/.pandoc
Copyright (C) 2006-2014 John MacFarlane
Web:  http://johnmacfarlane.net/pandoc
This is free software; see the source for copying conditions.  There is no
warranty, not even for merchantability or fitness for a particular purpose.
$ more test.textile 
"test"
$ pandoc -o test.markdown test.textile
$ more test.markdown 
“test”
Konfekt commented 6 years ago

In the other direction, for transforming internationalized to ASCII text: https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/normalize-punctuation.perl is a script that normalizes all quotes and punctuation, for example, guillemets to double quotation marks. For example normalize-punctuation.pl < file.md. For example in LaTeX (withcsquotes), the compiled pdfrestores all the internationalization.

odkr commented 6 years ago

I wrote a simple filter that replaces ASCII quotes with typographic ones and that respects the lang metadata field.

It should be fairly easy to customise. However, it’s only intended for output formats that treat quotes as part of a document’s semantics (e.g., OpenOffice, Word), not output formats that treat quotes as part of a document’s syntax (e.g., HTML, LaTeX).

You can install it by: pip3 install pandoc_quotes

See https://github.com/odkr/pandoc-quotes for details.

bubifengyun commented 6 years ago

How to turn off the quotes translated? eg. I had write chinese quotes “”, and I hope it is “” in the final pdf file. But pandoc would be self-clever to translate “” to `` '', and in some blocks pandoc would not translate them, as suggested in https://stackoverflow.com/questions/52052231/how-to-write-chinese-quotes-in-bookdown . it leaders to a chaos. Thank you.

chpio commented 6 years ago

How to turn off the quotes translated?

By not enabling smart.

bubifengyun commented 6 years ago

@chpio I find bookdown will run

/usr/bin/pandoc +RTS -K512m -RTS deepin-bible.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output deepin-bible.tex --table-of-contents --toc-depth 2 --template latex/template.tex --number-sections --highlight-style tango --pdf-engine xelatex --biblatex --listings --top-level-division=chapter --variable tables=yes --standalone

which does not contain smart.

You mean, it should add -smart in shell like this

markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-smart

??? Thank you.

bubifengyun commented 6 years ago

As described in https://stackoverflow.com/questions/52052231/how-to-write-chinese-quotes-in-bookdown

I had tested in bookdown template, and found that “”, which is Chinese quotes, would be translated to ``,''。 But if you write “” in a block or other begin,end blocks, the Chinese quotes, “”, would not be translated to ``,''。So you will get different Chinese quotes, in the final pdf file. Can I set in some place to turn off such translation? Thank you.

I had add -smart, it also do the same thing in the above.

bubifengyun commented 6 years ago

/usr/bin/pandoc +RTS -K512m -RTS deepin-bible.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-smart --output deepin-bible.tex --table-of-contents --toc-depth 2 --template latex/template.tex --number-sections --highlight-style tango --pdf-engine xelatex --biblatex --listings --top-level-division=chapter --variable tables=yes --standalone

jhutar commented 1 year ago

Hello! This might help. Assume you have this markdown doc:

---
lang: cs-CZ
csquotes: true
---

"Quotation test"

Using this command:

pandoc --pdf-engine=xelatex -o example.pdf example.md

You will get PDF with this quotation:

„Quotation test“
Konfekt commented 1 year ago

Thank you for the heads up that https://github.com/jgm/pandoc/commit/8031ac137f9f84bf6c12d66592b07a3244b049a9 was included three years ago; I remember vaguely that one had to hide to pandoc the activation of csquotes (with --smart ?!) for this package to work correctly.