Open nt0xa opened 8 years ago
It seems that readFile
use CP866 encoding on Windows (with default russian language)
Can confirm with Windows 10 and pandoc 1.16.0.2 (pandoc and pandoc-include built on the machine using stack).
As another data point for hunting down this bug:
Take a file containing the umlauts of a, o, and u (LATIN SMALL LETTER A WITH DIAERESIS etc.)
bug.markdown
ä ö ü
Run it in cmd as
chcp 65001
pandoc bug.markdown
and you get the expected:
ä ö ü
Even if you run it as
pandoc --filter pandoc-include bug.markdown
you get the expected output.
However, if you run it as
pandoc --filter pandoc-include incl.markdown
where incl.markdown only includes bug.markdown, you get messed-up characters.
Hi @russtone, @LarsEKrueger, thank you for your patience and the detailed issue. On a new branch I've made it possible to set the encoding to UTF-8, will this help in your case?
To use, simply add utf-8
after the include class, like so:
```include utf8
a.md
```
Can't test the branch right now. Might take a few days until I find the time.
However, I don't understand why you want to make a difference in the encoding of the file that does the include and the one that is included.
Pandoc is UTF-8 on input and there's no way around it. One would expect that include files are UTF-8 too, without requesting them to be. If there's a relevant use case that I don't see right now, you definitely need to document that.
@LarsEKrueger This makes it even easier, thank you!
Tried commit 53b0d1 on Windows 10 (with Creator's Update). Pandoc and pandoc-include compiled using stack and ghc 8.
Issue is still there.
Thank you for checking. Unfortunately I have no other idea what could be wrong.
Oh, I see. That commit is old, and does not contain the fixes. Please try the latest on the fixing branch: 913ca87 . Thanks!
Still doesn't work.
Could it be that the hSetEncoding
isn't evaluated due to laziness and didn't notice during testing (i.e. because your default encoding is already uft8)?
I use the following code in my filter and it does work on windows.
justReadFile :: String -> IO (Maybe [Block])
justReadFile fn = bracket (openFile fn ReadMode) hClose $ \handle -> do
hSetEncoding handle utf8
cont <- hGetContents handle
case readMarkdown def cont of
Left _ -> return Nothing
Right (Pandoc _ blocks) -> return $ Just blocks
If I use your fmap
pattern, it ceases to work correctly. The code is:
justReadFile :: String -> IO (Maybe [Block])
justReadFile fn = bracket (openFile fn ReadMode) hClose $ \handle -> do
fmap (`hSetEncoding` utf8) $ return handle
cont <- hGetContents handle
case readMarkdown def cont of
Left _ -> return Nothing
Right (Pandoc _ blocks) -> return $ Just blocks
Your variable handle
in fileContentAsString
is actually of type IO Handle
, not Handle
. Thus the fmap
typecheck correctly, but the hSetEncoding
is either run never or after the hGetContents
. It's the same reason I needed to add the return
, because fmap
wants an IO Handle
.
Thank you, you are awesome. I've updated the branch, and really hope this will solve the bug.
After removing the utf8 class from the include
code block, I ran the following command:
(cd test/encoding/ ; pandoc -f markdown -t html -s -o test.html --filter ../../dist/dist-sandbox-1d3e9dda/build/pandoc-include/pandoc-include test.md) pandoc-include: include.md: hGetContents: illegal operation (delayed read on closed handle) Error running filter ../../dist/dist-sandbox-1d3e9dda/build/pandoc-include/pandoc-include: Filter returned error status 1
I encountered this problem too when writing justReadFile (see previous comment). I fixed it by moving the readMarkdown inside the function.
Same error happens on Windows.
Steps to reproduce:
1) create two files
test.md
2) run
And result will be: