msys2 / MINGW-packages

Package scripts for MinGW-w64 targets to build under MSYS2.
https://packages.msys2.org
BSD 3-Clause "New" or "Revised" License
2.22k stars 1.19k forks source link

cygwin doesn't support Unicode in PATH/HOME #7188

Open jdpipe opened 3 years ago

jdpipe commented 3 years ago

I have an issue that arose from the use of my code by an MSYS2 user who has a non-ASCII username. I've done testing of my own with a new Windows user called "1414°", to confirm the problem, as shown below.

If I have some executable code installed in /home/1414°/.local/bin/omc.exe, then it is correctly located in bash, via which omc.

However, if I attempt to use Python to do the same job, shutil.which('omc') fails to locate the program. Furthermore, I see that if I output the contents of os.environ['PATH'] as understood by Python, then I get a strange unicode 'surrogate' character \udcb0 being shown. The correct unicode escape for the ° in /home/1414° should be \u00b0. So this looks like some issue with encodings and locales, which I can't see any clear fix for it. I presume that this issue with os.environ may be a precursor to the problem with shutil.which.

I believe that perhaps on MSYS2, there is some incorrect setting of the locale in Python, which has the result of mangling unicode characters in file paths.

Full session output below:

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ export PATH=$PATH:~/.local/bin

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ echo $PATH
/mingw64/bin:/usr/local/bin:/usr/bin:/bin:/c/Windows/System32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0/:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/1414°/.local/bin

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ which omc
/home/1414°/.local/bin/omc

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ ipython
Python 3.8.6 (default, Oct  1 2020, 13:01:33)  [GCC 10.2.0 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import shutil

In [2]: shutil.which('omc')

In [3]: shutil.which('ls')
Out[3]: 'C:\\msys64_2\\usr\\bin/ls.EXE'

In [4]: import os

In [5]: os.environ['PATH'].split(";")
Out[5]:
['C:\\msys64_2\\mingw64\\bin',
 'C:\\msys64_2\\usr\\local\\bin',
 'C:\\msys64_2\\usr\\bin',
 'C:\\msys64_2\\usr\\bin',
 'C:\\Windows\\System32',
 'C:\\Windows',
 'C:\\Windows\\System32\\Wbem',
 'C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\',
 'C:\\msys64_2\\usr\\bin\\site_perl',
 'C:\\msys64_2\\usr\\bin\\vendor_perl',
 'C:\\msys64_2\\usr\\bin\\core_perl',
 'C:\\msys64_2\\home\\1414\udcb0\\.local\\bin',
 'C:\\msys64_2\\mingw64\\bin\\']

In [7]:
lazka commented 3 years ago

Thanks, yeah, something's not right here.

jdpipe commented 3 years ago

Here is some further weirdness:


1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ export PATH=/mingw64/bin:/usr/local/bin:/usr/bin:/bin:/home/1414°/.local/bin

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ echo $PATH
/mingw64/bin:/usr/local/bin:/usr/bin:/bin:/home/1414°/.local/bin

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ which omc
/home/1414°/.local/bin/omc

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ python -X utf8 -c 'import os; print(os.environ["PATH"].split(";"))'
['C:\\msys64_2\\mingw64\\bin', 'C:\\msys64_2\\usr\\local\\bin', 'C:\\msys64_2\\usr\\bin', 'C:\\msys64_2\\usr\\bin', 'C:\\msys64_2\\home\\1414\udcb0\\.local\\bin', 'C:\\msys64_2\\mingw64\\bin\\']

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ python -X utf8 -c 'import os; print(os.environ["PATH"].split(";")[-2])'
C:\msys64_2\home\1414▒\.local\bin

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ export TEST1=1414°

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ python -X utf8 -c 'import os; print(os.environ["TEST1"])'
1414°

1414°@DESKTOP-6ADQVP0 MINGW64 ~
$ python -c 'import os; print(os.environ["TEST1"])'
1414▒

Firstly, it's conspicuous that MSYS seems to be quietly adding a PATH component before invoking PYTHON, even though it's not necessary.

Secondly, you can see therefore that arbitrary environment variables (TEST above) come through correctly (although I have to use this -X utf8 thing, whatever that is), but even with that, the PATH is still mangled.

lazka commented 3 years ago

yeah, I doubt that this is Python specific. more likely in the PATH translation that is happening in cygwin

lazka commented 3 years ago

Doesn't look like cygwin implements any kind of unicode support for environment conversion (see environ.cc, env_plist_to_win32 and CCP_POSIX_TO_WIN_A vs CCP_POSIX_TO_WIN_W)

So it's unlikely this is going to be fixed soon.