Open GoogleCodeExporter opened 9 years ago
Thanks for the bug report. The solution isn't obvious, since pandoc has to
support
multiple input and output formats. One thought is to parse '\ ' in markdown as
a
unicode nonbreaking space character, and change the LaTeX writer so it prints
this
character as '\ '. This way "Mr.\ Smith" would come out in LaTeX as "Mr.\
Smith"
and in HTML as "Mr. Smith", which would make sense. But I will think about it
some more. Suggestions welcome.
Original comment by fiddloso...@gmail.com
on 9 Jul 2008 at 4:25
My only objection to that method would be that it makes the markdown source less
humane. You shouldn't have to use an escape to be able to write "Mr. Smith"
normally. The LaTeX writer, ideally, should have a way of determining the
correct
spacing (from a dictionary of standard abbreviations, with a local dictionary
available per installation and per user, perhaps) and inserting whatever LaTeX
will
need to behave correctly--the Markdown probably shouldn't be tainted by the
LaTeX
idiosyncrasy.
It would still be useful to be able to specify inter-sentence or non-breaking
spaces
from the Markdown, though, for corner cases.
Original comment by deeay...@gmail.com
on 9 Jul 2008 at 6:10
I agree, it would be in the spirit of markdown to recognize these cases
automatically as far as possible. One difficulty, though, is that abbreviations
are language-specific. "Mr." works for English, but not for Spanish or German.
Though I suppose pandoc's smart-typography feature is already English-centric.
Original comment by fiddloso...@gmail.com
on 9 Jul 2008 at 6:23
Further note: It probably makes sense to add this to the smart typography parser
(which is enabled automatically when output is LaTeX). The parser would look
for
"Mr. " "Mrs. " "e.g. " and so on, and convert the final space into a unicode
nonbreaking space. The LaTeX writer could then escape this as "\ ".
But this still wouldn't provide a way to manually handle cases that aren't
handled automatically. Not sure how to do that.
Original comment by fiddloso...@gmail.com
on 11 Jul 2008 at 12:25
Suggestion: '\ ' in markdown produces a nonbreaking space, always.
That solves the second problem in comment 4, above.
Original comment by fiddloso...@gmail.com
on 11 Jul 2008 at 12:33
Fixed, I think, in r1298. It may still need fine-tuning, but try it.
Many abbreviations are recognized automatically. "Mr. Brown", "vol. 1", etc.
But you can also force a nonbreaking space using '\ ' in markdown.
Original comment by fiddloso...@gmail.com
on 11 Jul 2008 at 2:16
Thanks, that's great. Would it be difficult for users to add to that list of
abbreviations locally? We use it at my office for a lot of legal writing and
there
are a whole slew of citations/abbreviations that we need to escape.
By the way, currently we're exporting to odt to work around this. That exporter
is great.
Original comment by ianjsull...@gmail.com
on 11 Jul 2008 at 9:41
That's an interesting idea. I may implement that later. For now
I'll leave this issue open so I don't forget about it.
Original comment by fiddloso...@gmail.com
on 12 Jul 2008 at 12:38
Another idea might be to change the default behavior of the markdown2pdf script
to
use the LaTeX frenchspacing option, which simply gets rid of the special
intra-sentence spacing. Anyone that wants the special LaTeX spacing, or other
advanced features, can always convert directly to LaTeX output and use the
standard
tools there, but people who just want a simple PDF won't have to pay any
attention to
the abbreviation issue. Also it is a fix that would work for all languages,
most of
which never use special spacing between sentences.
Original comment by ianjsull...@gmail.com
on 23 Jul 2008 at 4:00
Yes, that's a thought. Note that you can just put '\frenchspacing' in the
pandoc
file -- markdown will parse it as raw latex, and it won't appear in non-tex
output
formats, like HTML. Or, even better, put it in a custom header.
I'm reluctant to make it the default, though. And even with frenchspacing,
there's
a reason to treat abbreviations specially: you don't want a line break in the
space between "Mr." and "Brown".
Original comment by fiddloso...@gmail.com
on 24 Jul 2008 at 3:34
ianjsullivan - I'm still thinking about the possibility of reading an 'abbrev'
files.
But if there are abbreviations you use a lot, why don't you just send me a list (or
post it here)? I'd like to make pandoc's default abbreviation list more
complete.
However, I'm reluctant to add abbreviations that might naturally occur at the
end of
a sentence: e.g., "I met with my Prof."
Original comment by fiddloso...@gmail.com
on 12 Sep 2008 at 10:39
fiddlosopher, thanks for the query. Unfortunately, we use a lot of legal
abbreviations here, which are both plentiful and probably not useful to most
people.
Original comment by ianjsull...@gmail.com
on 18 Sep 2008 at 8:16
Original issue reported on code.google.com by
ianjsull...@gmail.com
on 30 Jun 2008 at 6:53