python / cpython

The Python programming language
https://www.python.org
Other
61.9k stars 29.77k forks source link

Cryptic `WinError` reported when subprocess execution fails with WSL path #119646

Open ncoghlan opened 2 months ago

ncoghlan commented 2 months ago

Based on the investigation below, the most pragmatic change we can make here is to enhance the exception handling around the _winapi.CreateProcess call in subprocess such that the reported WinError that is raised when the command given is on a WSL path at least reports the offending command (the way os.startfile already does).


Original bug report

ensurepip fails on Windows venv creation with WSL UNC path

Bug description:

Running Python 3.12.3 from the Windows Store via Windows PowerShell, attempting to create a Windows virtual environment under a WSL UNC path fails:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan\_build_win64> $windows_python="$((Get-Command python3).Path)"
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan\_build_win64> & "$windows_python" --version
Python 3.12.3
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan\_build_win64> & $windows_python -Im venv --copies test_venv
Error: [WinError 1] Incorrect function
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan\_build_win64> & $windows_python -Im venv --copies \\wsl$\fedoraremix\home\acoghlan\_build_win64\test_venv
Error: [WinError 1] Incorrect function

(I initially thought this might be related to #102584 , as it has some similarities to #102496, but there were enough differences that it seemed worthwhile to file a dedicated issue. The investigation below strongly supports the idea that this is a different problem, specifically with ensurepip trying to invoke a subprocess when sys.executable is pointing at an executable stored in a WSL folder)

CPython versions tested on:

3.12

Operating systems tested on:

Windows

ncoghlan commented 2 months ago

The obvious workaround for this problem (make the Windows venv on a native Windows drive instead of making it inside WSL) works, but it's genuinely unclear to me why the UNC path doesn't work.

Initially I thought it might be symlink-or-copy confusion due to the mixture of filesystems (similar to https://github.com/tox-dev/tox/issues/1706 ), but adding the --copies option didn't eliminate the failure.

zooba commented 2 months ago

The obvious workaround for this problem (make the Windows venv on a native Windows drive instead of making it inside WSL) works, but it's genuinely unclear to me why the UNC path doesn't work.

This is probably an entire area of posixmodule that needs testing. The protocol used for cross-container files is P9, rather than SMB, and there's going to be a ton of stuff that isn't implemented [the same] and will need different handling. Most tellingly, failure codes will likely be different.

Personally I run two separate clones, but remote the WSL one to the Windows one. But then I only ever jump in for testing and not dev work, so it makes sense to edit in Windows, commit, then git pull && make in WSL.

attempting to create a Windows virtual environment under a WSL UNC path fails

As it should, really. No paths are going to be remapped correctly (check pyvenv.cfg for the one that "worked"), and I'm not even sure what happens if you execute a PE that's "deep" inside of WSL (as opposed to somewhere in /mnt). It probably has no X bit.

I'm not sure the best way to handle it, but perhaps we ought to be trying to detect this scenario and fail with a helpful error? Possibly looking for \\wsl$\ is good enough.

ncoghlan commented 2 months ago

Looking for \\wsl$\ in the path is actually what my build script does now to detect when I'm trying to do something that won't work and do something more reasonable (i.e. it puts the build folder on the Windows side when running from Windows, even if the git clone containing the build script is in WSL).

I haven't tried yet to see if mapping the WSL folder to a drive letter would be enough to make it work (from what you wrote above, I'm guessing it may still not work, but it seems worth trying anyway). (Edit: see experiment below. As predicted, the drive mapping didn't make any difference)

Personally I run two separate clones, but remote the WSL one to the Windows one. But then I only ever jump in for testing and not dev work, so it makes sense to edit in Windows, commit, then git pull && make in WSL.

Yeah, that's good advice, so I'll do that in the future (albeit the other way around, using Fedora WSL as my main dev env and testing on Windows via a local git pull).

ncoghlan commented 2 months ago

I just noticed the copy-and-paste of the PowerShell output had added a whole lot of spaces between the last command and its output, rather than a line break. This made it look fine in the edit widow, but thoroughly misleading when rendered (the last command with the full path failed the same way the one with the relative path did, but the incorrect formatting made it look like it had worked).

This prompted me to investigate the "failure" more closely, and the pyvenv.cfg actually looks the way I would expect it to:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> cat .\test_venv\pyvenv.cfg
home = C:\Users\Alyssa\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0
include-system-site-packages = false
version = 3.12.3
executable = C:\Users\Alyssa\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe
command = C:\Users\Alyssa\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m venv \\wsl$\fedoraremix\home\acoghlan\test_venv
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan>

Running Python from the venv also works:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> .\test_venv\Scripts\python.exe --version
Python 3.12.3

The error apparently happens later in the venv creation process, since most of the Scripts folder entries are missing:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> ls .\test_venv\Scripts\

    Directory: \\wsl$\fedoraremix\home\acoghlan\test_venv\Scripts

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
------        28/05/2024   8:16 PM         274712 python.exe
------        28/05/2024   8:16 PM         263448 pythonw.exe

vs a native Windows venv:

PS S:\Work> ls .\test_venv\Scripts\

    Directory: S:\Work\test_venv\Scripts

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        28/05/2024   8:25 PM           2036 activate
-a----        28/05/2024   8:25 PM            996 activate.bat
-a----        28/05/2024   8:25 PM          26199 Activate.ps1
-a----        28/05/2024   8:25 PM            393 deactivate.bat
-a----        28/05/2024   8:25 PM         108395 pip.exe
-a----        28/05/2024   8:25 PM         108395 pip3.12.exe
-a----        28/05/2024   8:25 PM         108395 pip3.exe
-a----        28/05/2024   8:25 PM         274712 python.exe
-a----        28/05/2024   8:25 PM         263448 pythonw.exe
ncoghlan commented 2 months ago

Huh, looks like it might be ensurepip that is breaking, since passing --without-pip makes the error go away, and the Scripts folder gets populated properly:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> rm -r test_venv
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> & $windows_python -Im venv --copies --without-pip test_venv
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> ls .\test_venv\Scripts\

    Directory: \\wsl$\fedoraremix\home\acoghlan\test_venv\Scripts

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
------        28/05/2024   8:30 PM           1021 activate.bat
------        28/05/2024   8:30 PM            393 deactivate.bat
------        28/05/2024   8:30 PM          26199 Activate.ps1
------        28/05/2024   8:30 PM           2086 activate
------        28/05/2024   8:30 PM         274712 python.exe
------        28/05/2024   8:30 PM         263448 pythonw.exe

Running ensurepip directly in that venv then fails with the same Windows error reported earlier, but with a more detailed traceback to say which API call failed:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> .\test_venv\Scripts\python.exe -Im ensurepip
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__main__.py", line 5, in <module>
    sys.exit(ensurepip._main())
             ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__init__.py", line 284, in _main
    return _bootstrap(
           ^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__init__.py", line 200, in _bootstrap
    return _run_pip([*args, *_PACKAGE_NAMES], additional_paths)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__init__.py", line 101, in _run_pip
    return subprocess.run(cmd, check=True).returncode
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1] Incorrect function
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan>
ncoghlan commented 2 months ago

@zooba The idea of detect-and-report-a-useful-error is sounding good to me, but based on the above that detect-and-error for \\wsl$\ paths may need to be in subprocess.Popen rather than being in venv or ensurepip.

ncoghlan commented 2 months ago

Just experimentally confirming what @zooba wrote above, that the problem here is with attempting to run a Windows binary from a WSL folder via _winapi.CreateProcess, not related specifically to the use of a UNC path to access that binary. We can see below that mapping the WSL path to a drive letter doesn't make the problem go away:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> .\test_venv\Scripts\python.exe -c "import sys; print(sys.executable)"
\\wsl$\fedoraremix\home\acoghlan\test_venv\Scripts\python.exe

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> New-PSDrive -Persist -Name "F" -PSProvider "FileSystem" -Root "\\wsl$\fedoraremix"

Name           Used (GB)     Free (GB) Provider      Root                                                                                                          CurrentLocation
----           ---------     --------- --------      ----                                                                                                          ---------------
F                  43.01        963.84 FileSystem    \\wsl$\fedoraremix
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan> cd F:\home\acoghlan\

PS F:\home\acoghlan> .\test_venv\Scripts\python.exe -c "import sys; print(sys.executable)"
F:\home\acoghlan\test_venv\Scripts\python.exe
PS F:\home\acoghlan> .\test_venv\Scripts\python.exe -Im ensurepip
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__main__.py", line 5, in <module>
    sys.exit(ensurepip._main())
             ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__init__.py", line 284, in _main
    return _bootstrap(
           ^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__init__.py", line 200, in _bootstrap
    return _run_pip([*args, *_PACKAGE_NAMES], additional_paths)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\ensurepip\__init__.py", line 101, in _run_pip
    return subprocess.run(cmd, check=True).returncode
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1] Incorrect function
PS F:\home\acoghlan>

It's apparently possible for this to work, since PowerShell itself manages to launch the binary from its thoroughly dubious WSL location, but subprocess is clearly doing something different from what PowerShell does here.

ncoghlan commented 2 months ago

Another option that occurs to me is that rather than trying to detect anything WSL specific, we could potentially intercept the OSError and report some more useful information on it (like the full path to the executable that the call failed to run).

zooba commented 2 months ago

Another option that occurs to me is that rather than trying to detect anything WSL specific, we could potentially intercept the OSError and report some more useful information on it

This sounds like a good option. Knowing that CreateProcess may return error 1 is useful - I'm pretty sure we already filter a number of values here (possibly through the mapping to errno values), and the only reason we aren't doing anything special for this one is just that it hasn't arisen before.

It would be interesting to see if os.startfile does any better in this situation. That uses ShellExecute rather than CreateProcess, which is more likely to be forgiving of things that aren't normal filesystems (as well as doing things that we wouldn't want to do by default... so it's just an interest not a solution).

ncoghlan commented 2 months ago

subprocess.run:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan\test_venv\Scripts> python -c "import subprocess; subprocess.run([r'.\python.exe', '--version'])"
  File "<string>", line 1, in <module>
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1] Incorrect function

subprocess.run with shell=True fails due to CMD not handling UNC paths.

os.startfile also gets WinError 1:

PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\fedoraremix\home\acoghlan\test_venv\Scripts> python -c "import os; os.startfile(r'.\python.exe', 'open', '--version')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OSError: [WinError 1] Incorrect function: '.\\python.exe'

os.system also tries to run CMD, so it fails the same way shell=True does.

zooba commented 2 months ago

Thanks. I suspect that means we can't do anything about it other than detect and fail-fast.