Open renard opened 8 years ago
You could try using the --html-q-tags option. Then use CSS to style the q tags appropriately.
If that doesn't work, then your options are:
+++ Sébastien Gross [Jan 05 16 10:54 ]:
I am currently using pandoc ans ist smart ponctuation to generate an epub output, this works pretty well.
In my source I have: Du "texte" en français!
The output is: Du “texte” en français!
But since the text is in French, I would like to use the French typography rules to get something like: Du « texte » en français !
(please note the nonbreaking spaces).
So is there a (easy) way to define some typography rules for an output? or this should be an enhancement?
Tanks a lot.
— Reply to this email directly or [1]view it on GitHub.
References
I am experiencing a similar issue. At first, I would have expected --smart
to handle typography for ponctuation as well, but it does not seem to do so.
First problem with --smart
and writing text in French (and maybe some other languages) is that French language does not use curly quotes but French quotes « ». In some keyboard layouts (thinking in fr oss
), they are easily reachable, but that is not the case on every keyboard layout (especially in Windows) and being able to automatically replace " " by « » could be very helpful. This could obviously be done using a post-processing script (or a Pandoc filter) but what about including a --french-quotes
option in Pandoc to do it?
Second problem is that typography, and especially the position (and nature) of whitespaces differ a lot from one language to another. In particular, in French (contrary to English), there should be a non-breaking space before any double punctuation sign (!, ?, :, ;). Similar rules exists for the spaces enclosing quotes (should be SPACE « NON_BREAKING_SPACE TEXT NON_BREAKING_SPACE » SPACE
if I remember correctly) and so on.
In particular, non breaking space are almost impossible to type easily (without special tweak of the keyboard layout). I think it would be awesome if Pandoc could handle it.
What do you think?
+++ Lucas Verney [Apr 14 16 15:10 ]:
I am experiencing a similar issue. At first, I would have expected --smart to handle typography for ponctuation as well, but it does not seem to do so.
First problem with --smart and writing text in French (and maybe some other languages) is that French language does not use curly quotes but French quotes « ». In some keyboard layouts (thinking in fr oss), they are easily reachable, but that is not the case on every keyboard layout (especially in Windows) and being able to automatically replace " " by « » could be very helpful. This could obviously be done using a post-processing script (or a Pandoc filter) but what about including a --french-quotes option in Pandoc to do it?
See #84. I'd actually never thought that a French writer would want to type " for quotes, and have them render with French quotes. But if that is the case, it wouldn't be all that hard to provide some kind of configurable option.
Another option would be localization, so that the quote
style is affected by the lang
metadata field. Though I
gather many languages don't have one standard quoting style.
Third option would be localization + an override.
Concerning the quotes, I may have an unusual approach, but indeed, "
seems to me to be the widely available quote character, and most easily typable. So being able to use it to be automatically replaced to «
/«
would be awesome, in my opinion. Still, there should be a way to prevent automatic conversion (like escaping) to be able to type "
in a French text as well (but the same problem stands for English typography).
:+1: for localization-based, using the lang
metadata field. Or an override option. The advantage of the localization-based method is that it also permits to tweak non-breaking spaces depending on the language.
Having a --french-quote
is not a good idea since this is a very dedicated task. Having a --lang
option is a better idea if you can extend a language map. Latex uses babel for that task.
Related issue #661
I'm stoked AF at the idea of being able to set it up so I can get »Danish style quotes« from --smart with a babel-like solution♥
Hi!
I've been reading this issue and #84 as well as the documentation but I haven't really understood how this should work, and if it's implemented for my use case.
I write text in markdown that I convert to ICML to use in InDesign documents. When I write Swedish text I want quotes to be identical ""
.
Here is my input and outputs:
sh-4.4$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
sh-4.4$ pandoc -v
pandoc 2.5
Compiled with pandoc-types 1.17.5.4, texmath 0.11.1.2, skylighting 0.7.5
Default user data directory: /home/tetov/.pandoc
Copyright (C) 2006-2018 John MacFarlane
Web: http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
sh-4.4$ cat test.md
---
lang: sv
---
"Test" ... --
sh-4.4$ pandoc -s -w icml -o test.icml test.md
sh-4.4$ cat test.icml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?aid style="50" type="snippet" readerVersion="6.0" featureSet="513" product="8.0(370)" ?>
<?aid SnippetType="InCopyInterchange"?>
<Document DOMVersion="8.0" Self="pandoc_doc">
<RootCharacterStyleGroup Self="pandoc_character_styles">
<CharacterStyle Self="$ID/NormalCharacterStyle" Name="Default" />
</RootCharacterStyleGroup>
<RootParagraphStyleGroup Self="pandoc_paragraph_styles">
<ParagraphStyle Self="$ID/NormalParagraphStyle" Name="$ID/NormalParagraphStyle"
SpaceBefore="6" SpaceAfter="6"> <!-- paragraph spacing -->
<Properties>
<TabList type="list">
<ListItem type="record">
<Alignment type="enumeration">LeftAlign</Alignment>
<AlignmentCharacter type="string">.</AlignmentCharacter>
<Leader type="string"></Leader>
<Position type="unit">10</Position> <!-- first tab stop -->
</ListItem>
</TabList>
</Properties>
</ParagraphStyle>
<ParagraphStyle Self="ParagraphStyle/Paragraph" Name="Paragraph" LeftIndent="0">
<Properties>
<BasedOn type="object">$ID/NormalParagraphStyle</BasedOn>
</Properties>
</ParagraphStyle>
</RootParagraphStyleGroup>
<RootTableStyleGroup Self="pandoc_table_styles">
<TableStyle Self="TableStyle/Table" Name="Table" />
</RootTableStyleGroup>
<RootCellStyleGroup Self="pandoc_cell_styles">
<CellStyle Self="CellStyle/Cell" AppliedParagraphStyle="ParagraphStyle/$ID/[No paragraph style]" Name="Cell" />
</RootCellStyleGroup>
<Story Self="pandoc_story"
TrackChanges="false"
StoryTitle=""
AppliedTOCStyle="n"
AppliedNamedGrid="n" >
<StoryPreference OpticalMarginAlignment="true" OpticalMarginSize="12" />
<!-- body needs to be non-indented, otherwise code blocks are indented too far -->
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
<CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
<Content>“Test”</Content>
</CharacterStyleRange>
<CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
<Content> … –</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
</Story>
</Document>
What I want/expect is:
[...]
<Content>”Test”</Content>
[...]
I can achieve achieve this by adding -f markdown-smart
as an argument, but I'd rather keep the other fixes smart does.
Is this a planned feature (to have specific quotes for different languages in ICML output) or is the solution to use -smart
?
@tetov - at this point, the solution is to use -smart
.
Maybe some day we'll implement configurable smart quotes, but it's not a priority now.
In that case, I have a workaround for you, Anton:
pipe the text through sed 's/"/”/g'
before putting it into pandoc. You're lucky that your desired quotes aren't symmetrical so you don't have to use anything "smart" in order to get them.
Be aware that Swedish has some other typesetting quirks like using spaced endashes – like this – rather than English-style non-spaced emdashes—like this—and there are some other weird things.
So perhaps it's best to either make sure your source document already has the typography you want (I sometimes use emacs smart-quotes-mode for this) or you run it through a quick little sed, perl, or tr filter before pandoc. Does that work?
@jgm Thanks, I understand!
@snan I thought about processing the text but didn't really know where to put that processings and the examples found looked daunting (which were with symmetrical quotes). Thanks! I'll add it before pandoc in my makefile.
I wasn't aware that those differences existed! Thanks a lot for pointing them out. I have some reading to do :).
snan notifications@github.com writes:
In that case, I have a workaround for you, Anton: pipe the text through
sed 's/"/”/g'
before putting it into pandoc. You're lucky that your desired quotes aren't symmetrical so you don't have to use anything "smart" in order to get them.
This will work fine unless you have straight quotes in non-textual contexts: code, HTML attributes, titles in markdown links.
In that case, you could achieve the same thing by using a simple lua filter, in conjunction with -smart.
I'll need to spend some more time learning lua and lua-filters in order to get that to work. I've forked the lua-filters repo started to cobble together something from the existing samples.
In the meantime I made a hacky solution in my Makefile.
Thanks for your help, @jgm and @snan!
Edit: While working on adding single quotation marks as well as dashes I realized that I could run the sed commands on the output-file, like this:
sed -i -e 's/‘/’/g' -e 's/“/”/g' output.icml
This gives me all of the benefits of smart will still keeping symmetrical quotation marks. Pandoc respects spaces around en-dashes so that is not a problem either.
@jgm:
This will work fine unless you have straight quotes in non-textual contexts: code, HTML attributes, titles in markdown links. In that case, you could achieve the same thing by using a simple lua filter, in conjunction with -smart.
Which runs first; the smart function or the lua-filter? I were thinking about putting the regexp in my edit above into a LUA-filter to make it work with any output format.
Smartification takes place at the parsing stage, so in the filter you'll have Quoted objects you can replace.
Anton T Johansson notifications@github.com writes:
@jgm:
This will work fine unless you have straight quotes in non-textual contexts: code, HTML attributes, titles in markdown links. In that case, you could achieve the same thing by using a simple lua filter, in conjunction with -smart.
Which runs first; the smart function or the lua-filter? I were thinking about putting the regexp in my edit above into a LUA-filter to make it work with any output format.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/2620#issuecomment-453139046
I do similar preprocessing (with sed
and similar tools) to change • bullets into hyphen bullets. Man, I wish ✪ would add that to markdown, that's the one thing I really miss from how I write plain text files.
Just wanted to chime in to say that localized smart quotes would be a fantastic feature to have.
As already said elsewhere, @Phyks suggestion to have a --french-quotes
flag doesn't make much sense. Why pick just French when so many languages have their own quoting rules?
In my case, German uses „
as opening and “
as closing quotes. Being able to automate the conversion from straight to curly would be a tremendous boon and would help me enormously in the editorial work I do (mainly converting Markdown to HTML).
converting Markdown to HTML
then see https://github.com/jgm/pandoc/issues/2620#issuecomment-169099590
@mb21 Using the --html-q-tags
flag would result in a <q>
tag being used for everything between quotation marks. That would be wrong in a most cases, since that tag is used to mark up inline quotations, which is all but a small subset of my actual use cases. Beside being semantically incorrect, I just need clean HTML without any CSS.
Using proper German quotes in the input is what I already do — before converting the markdown with the --ascii
flag to replace them with the corresponding HTML entities. I substitute manually every single straight quote in the drafts I receive from all over the place. It takes time, and that’s the process I’d like to automate.
As for using sed or perl to post-process the output, I didn’t explore the possibility, but that would be probably the way to go, before this functionality gets hopefully baked into Pandoc.
@odkr wrote a great Lua filter to handle this problem: https://github.com/odkr/pandoc-quotes.lua. It is now also available as part of the pandoc lua-filters collection: https://github.com/pandoc/lua-filters/tree/master/pandoc-quotes.lua
@odkr @tarleb That looks great. Thanks for bringing it to my attention.
Hello! This might help. Assume you have this markdown doc:
---
lang: fr
csquotes: true
---
"Quotation test"
Using this command:
pandoc --pdf-engine=xelatex -o example.pdf example.md
You will get PDF with this quotation:
« Quotation test »
You could try using the --html-q-tags option. Then use CSS to style the q tags appropriately.
This is a good way for HTML, but wrong way for (example) epub2.
In EPUB2?/FB2
the tag <q>
doesn't work properly, so android fbreader (for example) can't show the quotting properly.
So, it would be nice to have a way to convert "text"
quotting to «text»
without any tags.
PS: Also not only French uses such quotting style.
How about a Lua filter that replaces quote and double quote entities with plain elements with the quotes stuffed on the beginning and end? If you're trying to output to a different format and just worried about the output that should be pretty straight forward. If you want them in the source and round trip that might be a little more involved.
@unera and @alerque, I wrote that Lua filter a long time ago. (So long ago that I should have a look at again, but it should work.)
I am currently using pandoc ans ist smart ponctuation to generate an epub output, this works pretty well.
In my source I have:
The output is:
But since the text is in French, I would like to use the French typography rules to get something like:
(please note the nonbreaking spaces).
So is there a (easy) way to define some typography rules for an output? or this should be an enhancement?
Tanks a lot.