mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.25k stars 2.9k forks source link

[Lua] Can't use io.open on files with Unicode characters on their paths. #7704

Closed ekisu closed 4 years ago

ekisu commented 4 years ago

Important Information

Reproduction steps

Expected behavior

The file should open successfully.

Actual behavior

It errors out with No such file or directory. Not sure if this relates to #7701 or not.

Log file

log.txt

ghost commented 4 years ago

This is similar to #7701, except even worse, because we can't fix it unless we reimplement this Lua function.

TheAMM commented 4 years ago

Yeah, this is on Lua. io.open should work just fine on Linux, but on windows you'll have to (ab)use mpv's subprocessing calls, as they handle the wide conversion. Here's AMM handing out more wisdom, implementation left as an exercise to the reader: Use CMD /C type <path> to read a file. AFAIK there's no sane way to write a unicode-path file directly (least I didn't find one), so write an ascii-path temporary file and use CMD /C MOVE /Y <source_path> <target_path>. You'll have to read/write the entire file at once, but hey, it works. Consider using io.open if the path is plain ASCII (if you care).

ghost commented 4 years ago

That's quite a "do not try at home" solution, though.

avih commented 4 years ago

Maybe mpv can have some function to turn such paths into something which io.open will handle correctly?

Is it clearly a lua issue/bug? i.e. is lua really not expected to be able to open unicode paths on windows? Or maybe it expects some encoding?

TheAMM commented 4 years ago

I've had no problems with it for almost a year now. Necessity and mothering or something.

Is it clearly a lua issue/bug? i.e. is lua really not expected to be able to open unicode paths on windows? Or maybe it expects some encoding?

Yes it's on Lua. Yes, Lua is really not expected to be able to deal with Unicode paths. Lua expects the system encoding, and you aren't guaranteed to be able to represent unicode paths in it, whatever it may be. I'm no encoding expert though. You can find more info online with "lua unicode path" or whatnot, plenty of questions about it. tldr: Lua is simple and uses simple C calls which don't do unicode on Windows.

Maybe mpv can have some function to turn such paths into something which io.open will handle correctly?

Would be neat, and there are existing patches/repos for unicode in io.open (and pals), but I haven't looked how messy it'd be to incorporate (similar) changes.

(edit: fug)

avih commented 4 years ago

Well, the native encoding on windows is a variant of UTF16, but source files are highly likely to be UTF8, and because mpv does not rewrite script source files, it means there's no chance it can work.

So unless lua has a way for C code to hook its internal fopen, then I think it could only work for files which are at the native locale (i.e. not UTF8, and I think bytes above 128 are from the current locale codepage).

This kinda sucks.

ekisu commented 4 years ago

Maybe mpv can have some function to turn such paths into something which io.open will handle correctly?

This would be a pretty nice workaround, if possible.

avih commented 4 years ago

Also, even if it get fixed in lua, it probably won't get backported to lua 5.2/5.1, because they only support their latest version. But mpv currently does not support lua 5.3 or later.

I think the only chance to handle it reasonably would be in luajit, which does get bugfixes, and I think the commonly available mpv binary windows builds use luajit. So basically, if we ensure it gets fixed in luajit then the mpv windows build will get fixed.

ghost commented 4 years ago

Well, I wrote a lot of text and explanations about this and what to do, but Firefox ate it. But basically: fuck off and patch Lua.

ghost commented 4 years ago

This probably isn't clear, because it was in the text I wrote that Firefox lost (what a piece of shit browser, seriously), but: search for mp_open in mpv, understand what it does, apply it to the lua source.

avih commented 4 years ago

[Lua] Can't use io.open on files with Unicode characters on their paths

Just to make it clear: it's a lua bug. Lua loads and runs the script files, and mpv does not interfere with its io.open. If a source file is UTF8 and io.open in that source file is unable to open the path given at this source file then the issue starts and ends inside lua.

ghost commented 4 years ago

No, it's a mingw/microsoft bug. Or them shitting their awful design decision on your face.

ghost commented 4 years ago

I copied the mpv I/O wrappers to standalone code:

https://github.com/wm4/libinsanity/blob/master/microsh%E2%80%A6/fs_io.c https://github.com/wm4/libinsanity/blob/master/microsh%E2%80%A6/fs_io.h https://github.com/wm4/libinsanity/blob/master/microsh%E2%80%A6/fs_io_replace.h

I encourage those who do Windows builds (maybe @shinchiro ?) to patch Lua with these functions. Should consist of adding these source files to Lua, adding an #include "fs_io_replace.h to every source file that calls I/O functions, and maybe adding windows header defines to the build scripts.

shinchiro commented 4 years ago

Thanks to that. I managed to patch the luajit with it. Now, io.open can accept unicode chars

PalmtopTiger commented 4 years ago

Note that os.getenv on Windows returns ANSI string. So the code like io.open(os.getenv("APPDATA") .. "/mpv/data.json", "w") which is used to open files in the mpv config dir in multiple scripts (1, 2, 3), works without the patch. Therefore, you should also patch os.getenv, otherwise many scripts will be broken.

shinchiro commented 4 years ago

I just noticed that getenv() wrapper is missing in @wm4 standalone code, which why os.getenv is not working yet

ghost commented 4 years ago

I omitted getenv, because it'd have added a dependency on pthreads (due to getenv() needing to return a new allocation, but the caller not having to free it). Might add later.

ghost commented 4 years ago

Added getenv(), in the same place.

shinchiro commented 4 years ago

Build mpv-x86_64-20200517-git-152b0e2.7z should solved this issue since I patched the luajit.

PalmtopTiger commented 4 years ago

Unfortunately no. getenv still returns ANSI string. print(os.getenv("APPDATA")) outputs broken characters. test.lua.txt

PalmtopTiger commented 4 years ago

@shinchiro I think you forgot to include lj_fs_io_replace.h in lib_os.c.

shinchiro commented 4 years ago

Updated the luajit's patch. Really weird how os.getenv() works fine on my windows before it

ghost commented 4 years ago

The console might have detected it as ANSI and converted it, or so.

kesdoputr commented 4 years ago

@shinchiro Hello, when i update to mpv-x86_64-20200517-git-152b0e2.7z I get several error in osc.conf and stat.conf, detail is on the above issue link. The same conf is all ok on previous ver mpv-x86_64-20200510-git-4e94b21.7z Please help to check if it's due to this patch? Thanks you.

PalmtopTiger commented 4 years ago

Both functions (io.open and os.getenv) work in build 685bd6a. Thank you all!