mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.25k stars 2.9k forks source link

AviSynth text encoding issue using Korean code page #7522

Closed stax76 closed 4 years ago

stax76 commented 4 years ago

AviSynth supports only ANSI and not Unicode.

In Western Europe the ANSI code Windows-1252 is used, this is a single byte code page.

In Korea the ANSI code page Windows-949 is used, this code page is not single byte.

The issue can be reproduced very easily, in the Windows 10 settings go to:

Time & Language > Language > Administrative Language > Language for non unicode programs

Change to Korean, log out now and log in again.

Now in the new Windows Terminal you can do:

Desktop> 'version()' | Out-File -FilePath .\스스스.avs -Encoding Default
Desktop> mpv .\스스스.avs
[ffmpeg/demuxer] avisynth: Import: couldn't open "C:\Users\frank\Desktop\???.avs" [lavf] avformat_open_input() failed
[ffmpeg/demuxer] avisynth: Import: couldn't open "C:\Users\frank\Desktop\???.avs" [lavf] avformat_open_input() failed
Failed to recognize file format.

Exiting... (Errors when loading file)

ffmpeg has no problem:

Desktop> ffmpeg -i .\스스스.avs -hide_banner
Input #0, avisynth, from '.\스스스.avs':
  Duration: 00:00:10.00, start: 0.000000, bitrate: 0 kb/s
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 384x104, 24 fps, 24 tbr, 24 tbn, 24 tbc
At least one output file must be specified
Desktop>

Untitled

mpv cannot open the file, not from the command line and also not from drag & drop, mpc does also not work.

ffmpeg, VirtualDub2 or my staxrip can open it.

ffmpeg uses the AviSynth C interface, VirtualDub2 uses the AviSynth avifile interface and my staxrip uses the AviSynth C++ interface, here is my very simple code:

https://github.com/staxrip/staxrip/blob/master/FrameServer/AviSynthServer.cpp#L42

Here are my system settings:

'Get-Culture:         ' + (Get-Culture         | select -ExpandProperty DisplayName)
'Get-UICulture:       ' + (Get-UICulture       | select -ExpandProperty DisplayName)
'Get-WinSystemLocale: ' + (Get-WinSystemLocale | select -ExpandProperty DisplayName)
'Input keyboard:      ' + (Get-WinDefaultInputMethodOverride | select -ExpandProperty Description)
'CodePage:            ' + ([Text.Encoding]::Default | select -ExpandProperty CodePage)
Get-Culture:         German (Germany)
Get-UICulture:       English (United States)
Get-WinSystemLocale: Korean (Korea)
Input keyboard:      German (Germany) - German
CodePage:            949

A lot more info here:

https://forum.doom9.org/showthread.php?t=175845&page=75

ghost commented 4 years ago

mpv doesn't have avisynth support. FFmpeg does, which is probably what you're trying to use. FFmpeg's avisynth wrapper most likely makes avisynth open the file, and avisynth most likely does it in a broken way (since Avisynth is such a horrible ancient piece of shit), and there's nothing mpv could do about that.

stax76 commented 4 years ago

But ffmpeg self opens it fine, maybe the ffmpeg splitter or the process is configured differently.

ghost commented 4 years ago

mpv uses UTF-8, libavformat uses UTF-8, mpv uses libavformat. You need to debug this yourself. Nothing I can or want to do.

stax76 commented 4 years ago

Maybe somebody can have a look, otherwise no problem, yes maybe I can do it at some time, thanks.

ghost commented 4 years ago

Also there are no code pages. Windows uses Unicode, just as UTF-16 instead of UTF-8.

stax76 commented 4 years ago

Yes, internally UTF-16 is used everywhere in Windows I think.

qyot27 commented 4 years ago
Microsoft Windows [Version 10.0.18362.719]
C:\WINDOWS\system32>e:

E:\>cd Documents\ffunicodeavstest

E:\Documents\ffunicodeavstest>ls
''$'\354\212\244\354\212\244\354\212\244''.avs'

E:\Documents\ffunicodeavstest>dir
 Volume in drive E has no label.
 Volume Serial Number is 90F6-7894

 Directory of E:\Documents\ffunicodeavstest

03/12/2020  07:55 PM    <DIR>          .
03/12/2020  07:55 PM    <DIR>          ..
03/12/2020  07:55 PM                11 스스스.avs
               1 File(s)             11 bytes
               2 Dir(s)  51,612,684,288 bytes free

E:\Documents\ffunicodeavstest>mpv 스스스.avs
 (+) Video --vid=1 (rawvideo 384x104 24.000fps)
VO: [gpu] 384x104 bgr24
V: 00:00:03 / 00:00:10 (35%)

Exiting... (Quit)

E:\Documents\ffunicodeavstest>ffmpeg -hide_banner -i 스스스.avs
Input #0, avisynth, from '스스스.avs':
  Duration: 00:00:10.00, start: 0.000000, bitrate: 0 kb/s
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 384x104, 24 fps, 24 tbr, 24 tbn, 24 tbc
At least one output file must be specified

Check the box that says "Beta: Use Unicode UTF-8 for worldwide language support" in that Language setting dialog you pointed out.

avih commented 4 years ago

mpv can open such name correctly (i used the same name with .mp3 file extension), so mpv does not seem to mess the file name.

The issue could be at the interface between mpv and ffmpeg, though hard to tell without further analysis at which side of it. However, considering that mpv does get the name correctly, my guess would be at the ffmpeg side.

Also, while non ascii, the UTF codepoints at this name should not be special, and they're all less than U+FFFF (U+10000 and higher could pose issues because conversion between utf16 and utf8 is less direct, but this is not the case here).

stax76 commented 4 years ago

With that hidden beta check box it works, confirmed.

PowerShell/Dotnet returns now 65001 instead of 949 (Korea):

Desktop> [Text.Encoding]::Default | select -ExpandProperty CodePage
65001

65001 is apparently UTF-8

win32 gives 65001 as well

#include <Windows.h>
#include <iostream>

int main()
{
    auto cp = GetOEMCP();
    std::cout << cp << "\n";
    std::cin.get();
}

I'm not really fully understanding it.

avih commented 4 years ago

@qyot27 I see you're an AviSynth developer, is there any chance AviSynth can be made to take UTF-8 names at the C interface? Maybe even without breaking existing compatibility?

Taking only codepage encoding is rather egacy and limited (can it, for instance, take names with chars from two different codepages?)

qyot27 commented 4 years ago

While some UTF-8 helpers exist in the core, I think it's mostly for the utf8=true workaround that exists in some of the source filters¹. Having now found out about the fact you can just set Windows 10 to use UTF-8 pervasively like every other sane OS out there, it seems like less of an issue to me; especially as it renders those pieces of special UTF-8 workarounds redundant and fixes the script filename issue without having to do anything to the code of any of the software involved.

¹which applies to the encoding of the script and the name/path of the video/audio file it's trying to open, not the filename/path of the script itself. Some external filters/plugins support this on their own too.

stax76 commented 4 years ago

Could this line be the problem:

http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/avisynth.c;h=55a2efd884e287e8b3aacc8f339235059049a37a;hb=0830e9116f786572865a9c800a9156d0c4294f27#l570

Maybe CP_THREAD_ACP is different here from CP_ACP, my code uses CP_ACP and VirtualDub2 works fine using avifile interface and it's using CP_ACP too it seems:

https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/main.cpp#L354

ghost commented 4 years ago

Oh man. Please stop bothering us with this 1990 shit. Fix your ridiculous AVS problems somewhere else.