Open kice opened 1 year ago
Thanks for investigating the issues, I haven't tested this plugin on windows.
Pre-build DLL cannot be loaded under windows. Most likely missing some dlls. Have to compile in Windows to make it works.
Maybe needs some iconv dll, I wonder how the best way will be to handle this with the new lite xl package manager...
Unable to load files path that is not ascii encoded.
Could you work on adding the proper windows support and PR?
I think we could decode it with best effort if the detect result is incorrect or prompt the user for correct encoding.
Mmm, so maybe a commandview listing all valid encodings and let the user choose one?
I tried to add missing dll by checking import table, also add the dll that iconv used, and so on. Somehow still unable to load encoding.dll; that's why I tried to compile it myself. However, I found that the dll I compiled is static linked everything, so you might want to do the same for a standalone release.
I am not sure how the predefined macros works with meson, so change if you see fit.
#ifndef _WIN32
#else
#include <windows.h>
#endif
int f_detect(lua_State *L) {
const char* file_name = luaL_checkstring(L, 1);
#ifndef _WIN32
FILE* file = fopen(file_name, "rb");
#else
wchar_t utf16[1024];
memset(utf16, 0, sizeof(utf16));
MultiByteToWideChar(CP_UTF8, 0, file_name, strlen(file_name), utf16, 1024);
// utf8_to_utf16le(file_name, utf16, 1024, NULL);
FILE* file = _wfopen(utf16, L"rb");
#endif
if (!file) {
lua_pushnil(L);
lua_pushfstring(L, "unable to open file '%s', code=%d", file_name, errno);
return 2;
}
//rest of the function
I think not include any windows header would also work, but require to write a decoder for decoding the filename from UTF-8 to Unicode Point and then a encoder for UTF16-LE. Just keep in mind that Windows file path should support codepoint higher than 0xFFFF, which use UTF-16 surrogate pair and two uint16_t, such as rocket emoji π U+1F680.
Or just use detect_string
instead of detect
.
For auto decoding, the fact is uchardet hardly get the encoding right for most asian languages. And some encoding is kinda of a subset of another encoding. For example, most of the bytes of shift-jis/CP932 can be decoded as GBK/CP936.
For me, if detect as UTF-8, UTF-16 LE, UTF-16 BE, then just decode with the result. For the rest, use the result to sort the list and then let user choose from it.
Ok, so I added your change https://github.com/jgmdev/lite-xl-encoding/commit/de1c9be297e034e8466dd7ee0d714b0481684379 hope that I copy pasted properly :)
I tried to compile it myself. However, I found that the dll I compiled is static linked everything
Can you share how did you build the plugin to accomplish this?
For auto decoding, the fact is uchardet hardly get the encoding right for most asian languages.
Does doc:reload-with-encoding
tackles that?
For me, if detect as UTF-8, UTF-16 LE, UTF-16 BE, then just decode with the result. For the rest, use the result to sort the list and then let user choose from it.
So currently encoding_detect
checks if one of the registered boms in bom_list
is found on the given string, if that is the case then it converts from that bom charset into utf8 and if success returns the found bom charset.
If the bom detection and conversion fails then it checks if the string is valid utf8 and just return that charset. Otherwise it fallbacks to uchardet and if uchardet fails it just errors.
What you want is on error to allow manually specifying the charset or change the order of the steps above?
Doing a local test using ntldd
to check linked libraries and building under msys2 mingw64 this is what I got:
ADVAPI32.dll => C:\Windows\SYSTEM32\ADVAPI32.dll (0x000001feed4e0000)
GDI32.dll => C:\Windows\SYSTEM32\GDI32.dll (0x000001feee610000)
libiconv-2.dll => C:\msys64\mingw64\bin\libiconv-2.dll (0x000001feed990000)
IMM32.dll => C:\Windows\SYSTEM32\IMM32.dll (0x000001feed4e0000)
KERNEL32.dll => C:\Windows\SYSTEM32\KERNEL32.dll (0x000001feed990000)
msvcrt.dll => C:\Windows\SYSTEM32\msvcrt.dll (0x000001feed990000)
ole32.dll => C:\Windows\SYSTEM32\ole32.dll (0x000001feed990000)
OLEAUT32.dll => C:\Windows\SYSTEM32\OLEAUT32.dll (0x000001feee220000)
SETUPAPI.dll => C:\Windows\SYSTEM32\SETUPAPI.dll (0x000001feeedb0000)
SHELL32.dll => C:\Windows\SYSTEM32\SHELL32.dll (0x000001feedd60000)
libstdc++-6.dll => C:\msys64\mingw64\bin\libstdc++-6.dll (0x000001feedd60000)
USER32.dll => C:\Windows\SYSTEM32\USER32.dll (0x000001feee450000)
VERSION.dll => C:\Windows\SYSTEM32\VERSION.dll (0x000001feed560000)
WINMM.dll => C:\Windows\SYSTEM32\WINMM.dll (0x000001feed4e0000)
From this the revelant dlls would be:
libstdc++-6.dll => C:\msys64\mingw64\bin\libstdc++-6.dll (0x000001feedd60000) libiconv-2.dll => C:\msys64\mingw64\bin\libiconv-2.dll (0x000001feed990000)
meson setup build
meson compile -C build
meson is installed with ninja by meson-1.0.0-64.msi
, and I am not using MSYS2, and gcc/g++ is installed by scoop.
doc:reload-with-encoding
.on error to allow manually specifying the charset
is what i want. Changing the order would not work since nobody in the chain has the correct encoding.
with open("gbk.txt", "w", encoding="gbk") as f:
f.write("δΈζζ΅θ―")
with open("shiftjis.txt", "w", encoding="cp932") as f: f.write("γγγγγζ₯ζ¬θͺ")
ok, so fixed the dynamically linked libraries with this https://github.com/jgmdev/lite-xl-encoding/commit/e2cd00ba2304e8a0625ed6851b49cd96f76b39cd now what is missing is asking the user for charset if not detected :)
I have few issues with this plugin that prevent me from opening files.
fopen
will return NULL. I would suggest convert file path to UTF-16LE first and then call_wfopen
, also local active code page does not contain all Unicode characters, and Windows allow almost all unicode character as file path. https://github.com/jgmdev/lite-xl-encoding/blob/b1ddf226277ea12a03ed9db2ddda458988020e91/src/encoding.c#L303