Closed wenbopeng closed 8 months ago
Please be more specific: give a short example that reproduces the problem with full instructions.
I don't think this is totally true, @wenbopeng. The cyrillics path reads successfully with lua51 test.lua
in Windows:
local fh = io.open('R:/Проверка/Тестовая директория/мой файл.txt')
print(
fh:read()
)
It still may be true for chinese or any other complicated script glyphs.
Yes, my statement may be wrong, to be precise, it does not support Chinese file names and paths. utf-8 is not allowed, io.open
only supports ANSI
. A method that might work is UTF-8 -> UTF-16, UTF-16 -> ANSI
see: how can i use io.open to open a unicode path in lua - Stack Overflow, 2023-07-12 09:49
@wenbopeng This is clearly OS-dependent stuff. I can successfully read a file with chinese glyphs in path:
local fh = io.open('R:/测试/测试.txt')
print(
fh:read()
)
Copypasted glyphs, made a directory, created a file with same name & .txt
… Lua5.1
I use include-files.lua
, using the following code block in my markdown file
content....
```{.include}
D:/测试.md
content....
pandoc will report an error: `Pandoc warnings:Cannot open file D:/测试.md | Skipping includes`
However, if I use the following block of code
````markdown
content....
```{.include}
D:/test.md
content....
The result is exactly right
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
Embedded in pandoc 3.1.4
@wenbopeng This is clearly OS-dependent stuff. I can successfully read a file with chinese glyphs in path:
local fh = io.open('R:/测试/测试.txt') print( fh:read() )
Copypasted glyphs, made a directory, created a file with same name &
.txt
… Lua5.1
I dunno what to say. I use ConEmu on Winx64 with UTF8 enabled.
user@DESKTOP-ILKGP6O 16:39:01 R:\ $ pandoc lua
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
Embedded in pandoc 3.1.2
> fh = io.open('R:/测试.txt')
> print(fh)
file (00007ffded19fa90)
> print(fh:read())
dfsdfsdf
>
I'm going to close this, pending more useful information, because it looks like there is a way to do this; the issue is something in OP's setup.
Expecting non-ASCII file names to work in Lua is to expect a bit much. Lua basically brags about being bytes-only, Pandoc expects UTF-8 input and the OS encoding may be anything. The only reliable fix is to rename the file to ce4shi4.txt
, or since that means "test" to romanize/Anglicize the name of the actual file as appropriate. I use a more-than-ASCII language but I don't expect non-ASCII file/directory names to work when dealing with commandline programs. It means some strictures, and sometimes you have to temporarily copy things to ASCII names.
You said you copy and pasted the file name, but were the copy and paste operations done from the asme app? If you copied form Explorer and pasted into a terminal or copied from a terminal and pasted into an editor or some other miss-match combination it is quite likely that the encoding for the characters is different. Just because two apps visually show the same file name doesn't mean they are doing so using the same encoding, and potentially neither actually are 1-for-1 with how the file system has encoded the same.
You might try using Lua itself to list the files in the directory and opening them. That will almost certainly get you a byte string representation that you can turn around and re-use in your include.
Use io.open
with pandoc.text.toencoding
to make it work with
non-UTF-8 filesystems.
local fh = io.open(line)
Unable to read non-latin path and filenameshttps://github.com/pandoc/lua-filters/blame/2aa98bfda556c7d4dfb8e30c20b318b6fd1f5091/include-files/include-files.lua#L86