Closed GoogleCodeExporter closed 9 years ago
I forgot to include the error message. Here it is:
$ pandoc -o ae.html ae.txt
pandoc: ae.txt: hGetContents: invalid argument (Invalid or incomplete
multibyte or wide character)
Original comment by Sebastia...@googlemail.com
on 18 Apr 2010 at 9:57
See the following from the pandoc man page (and README):
Pandoc uses the UTF–8 character encoding for both input and output
(unless compiled with GHC 6.12 or higher, in which case it uses the
local encoding).
I'm assuming your pandoc was compiled with GHC 6.12. We're in a transitional
phase;
once GHC 6.12 is well established, we should be able to get rid of the
statement that
pandoc uses UTF-8 for input and output.
Of course, an alternative would be to keep this behavior, even when compiled
with GHC
6.12. I'm not sure which is better.
Original comment by fiddloso...@gmail.com
on 19 Apr 2010 at 3:34
Please keep the behaviour to use always UTF-8. This way, you can read files from
other users, no matter what locale they have.
Please remove the locale dependence as soon as possible, so users don't start
creating markdown documents with non-UTF-8 encodings.
Original comment by Sebastia...@googlemail.com
on 19 Apr 2010 at 9:50
+1 to UTF-8. People are, in general, uninformed about encodings. The only sane
solution
is to use a fixed encoding everywhere, and UTF-8 seems to be the de facto
choice. It is
used, by default, majority of modern text editors etc. There's absolutely no
advantage
of not using UTF-8.
Original comment by joonas.p...@gmail.com
on 20 Apr 2010 at 6:34
We're using pandoc to generate the documents for an open source software
project. Our
documents are UTF-8 encoded so that's how they should be interpreted,
regardless of
the locale setting of the user who is building our software (they didn't write
the
document, we did). So, at the least, I would like to have an option to force the
input encoding to UTF-8.
Original comment by noval...@gmail.com
on 20 Apr 2010 at 7:19
Resolved in fb201a5b46bb49aa57a8462d7ded8ea2ff76be81
Pandoc now assumes UTF-8 in input, and produces UTF-8 in output, no matter what
the locale -- just as it did
before GHC 6.12 came around.
Original comment by fiddloso...@gmail.com
on 7 May 2010 at 6:07
Original issue reported on code.google.com by
Sebastia...@googlemail.com
on 18 Apr 2010 at 9:55Attachments: