python / cpython

The Python programming language
https://www.python.org/
Other
61.16k stars 29.52k forks source link

venv windows: Missing `venv/bin` folder, should symlink `venv/bin` -> `venv/Scripts` #97586

Open zackees opened 1 year ago

zackees commented 1 year ago

Bug report

Entering into a venv environment on Windows doesn't work well.

The command to enter into a virtual environment on Linux and MacOS looks like this:

source venv/bin/activate.sh

On Windows however, there is no venv/bin but instead venv/Scripts, which appears to be a similar / same folder to venv/bin. Therefore on windows (git-bash) source venv/bin/activate.sh does not work

As a workaround in my projects, I simply create a symlink using the windows junction point feature (mkdir /j) so that symlink venv/bin points to venv/Scripts and then everything seems to work right.

Windows gained non-admin symlinking in 2019

The fix for this problem only became available in 2019 since mkdir /j was added or enabled for windows directory symlinks. Prior to this, a windows host needed to have admin privilege's or a registry hack in order to symlink a folder.

Your environment

eryksun commented 1 year ago

CMD's mklink /j command creates a junction (mount point), not a symlink [^1]. Junctions have been supported since Windows 2000. The filesystem must support reparse points (e.g. NTFS and ReFS, but not FAT32 and exFAT). Creating a junction does not require a specific privilege.

CMD's mklink /d command creates a symlink to a directory. Symlinks have been supported since Windows Vista (2007). The filesystem must support reparse points. Creating a symlink requires the user to have SeCreateSymbolicLinkPrivilege. By default this privilege is only granted to the "Administrators" group, but it can easily be granted to the "Users" group (or "Authenticated Users"), in which case it's always available without having to elevate. Windows 10+ supports unprivileged creation of symlinks if the system is in developer mode.

What do you suggest be done if venv can't create a symlink or junction because the filesystem doesn't support reparse points? Should we leave that as an unsupported case? Would it be better to create "bin" as the real directory, and try to create "Scripts" as a compatibility link?

[^1]: FYI, a mount point is handled differently than a symlink when opening a path. The resolved path of a symlink replaces the previously traversed path, while that of a mount point does not. This can affect how relative symlink targets are resolved in the remaining path (e.g. a rooted target path such as "\spam" or a target path with ".." components such as "..\..\spam"). Also, when accessed in a UNC path, the target of a mount point is traversed on the server side, which is possible because a junction is only allowed to target local system devices (that is, devices that are local to the server), while the target of a symlink has to be resolved and traversed on the client (uh oh, that link to "C:\spam" actually resolves to the "C:" drive on the client machine; fortunately remote-to-local symlinks are disallowed by default).

zackees commented 1 year ago

What do you suggest be done if venv can't create a symlink or junction because the filesystem doesn't support reparse points? Should we leave that as an unsupported case? Would it be better to create "bin" as the real directory, and try to create "Scripts" as a compatibility link?

My humble suggestion is to create a junction point venv/bin -> venv/Scripts. I've used that when working at Google and for 40-some independent python projects and it works great. I don't know if a real symlink on windows would have any extra benefit.

If the juction point fails then emit a warning and continue. It's the same behavior as it is now, anyway.

eryksun commented 1 year ago

My humble suggestion is to create a junction point venv/bin -> venv/Scripts.

That's fine, regarding the warning and not supporting "bin" if the filesystem doesn't support reparse points. I was just curious whether you thought forward consistency with POSIX "bin" was more important than backward compatibility with "Scripts" in that case.

Python doesn't currently have a supported function that creates a junction. We have _winapi.CreateJunction() for internal use in tests, but it's not cautiously designed as a public API, and it lacks comprehensive tests. We'd either have to improve it or call out to the shell via subprocess.run('mklink /j ...', shell=True). Passing a command line to the shell introduces its own set of risks and problems.

zackees commented 1 year ago

I was just curious whether you thought forward consistency with POSIX "bin" was more important than backward compatibility with "Scripts" in that case.

I do, but that ship has sailed so to speak. The biggest importance is maintaining behavior for older clients that depend on venv/Scripts.

We have _winapi.CreateJunction() for internal use in tests, but it's not cautiously designed as a public API, and it lacks comprehensive tests

Yes, this is what I use too and it works great. Does it need to be introduced as a public api? It seems like it's just necessary to invoke internally.

eryksun commented 1 year ago

Does it need to be introduced as a public api?

No, but CreateJunction() would need still need to be improved. It should support passing src_path as an extended path that's prefixed by "\\?\". If it enables SeRestorePrivilege for backup semantics, it should only be enabled for the current thread, not the entire process. It should validate that src_path is on a local device; else a broken junction is created that won't be discovered until something tries to traverse it. If it fails to set the reparse point, it should delete the created directory, dst_path. Above all, it needs tests to validate its design and protect against regressions.

eryksun commented 1 year ago

We'd also need to fix Argument Clinic, because the generated function _winapi_CreateJunction() is really bad code. If the call has the wrong number of arguments or wrong argument types, it calls PyMem_Free((void *)src_path) and PyMem_Free((void *)dst_path) on uninitialized local variables, which corrupts the heap and causes a crash.

zackees commented 1 year ago

Why not use mklink ? Is there a concern that mklink can be swapped out on the host machine?

eryksun commented 1 year ago

Here's an attempt at cleaning up _winapi.CreateJunction().

static PyObject *
_winapi_CreateJunction_impl(PyObject *module, LPCWSTR src_path,
                            LPCWSTR dst_path)
/*[clinic end generated code: output=44b3f5e9bbcc4271 input=963d29b44b9384a7]*/
{
    WCHAR print_path_buf[MAX_PATH];
    WCHAR vol_path_buf[MAX_PATH];
    LPWSTR print_path = print_path_buf;
    LPWSTR vol_path = vol_path_buf;
    size_t len = Py_ARRAY_LENGTH(print_path_buf);
    size_t prefix_maybe = 4;
    size_t rdb_size;
    _Py_PREPARSE_DATA_BUFFER prdb = NULL;
    HANDLE hdir = INVALID_HANDLE_VALUE;
    BOOL created = FALSE;
    BOOL success = FALSE;

    if (PySys_Audit("_winapi.CreateJunction", "uu", src_path, dst_path) < 0) {
        return NULL;
    }

    if (src_path == NULL || dst_path == NULL) {
        // This should never occur. The Argument Clinic declaration
        // doesn't allow NoneType.
        return PyErr_SetFromWindowsErr(ERROR_INVALID_PARAMETER);
    }

    if ((wcsnlen(src_path, 4) == 4) &&
        (src_path[0] == L'\\' || src_path[0] == L'/')) {
        if (src_path[1] == L'\\' || src_path[1] == L'/') {
            if ((src_path[2] == L'?' || src_path[2] == L'.') &&
                (src_path[3] == L'\\' || src_path[3] == L'/')) {
                prefix_maybe = 0;
            } else {
                // UNC paths are not supported.
                SetLastError(ERROR_INVALID_PARAMETER);
                goto cleanup;
            }
        } else if ((src_path[1] == L'?' && src_path[2] == L'?') &&
                   (src_path[3] == L'\\' || src_path[3] == L'/')) {
            // NT "\??\" could be silently replaced with Windows "\\?\",
            // but strictly speaking this is a domain error.
            SetLastError(ERROR_INVALID_NAME);
            goto cleanup;
        }
    }

    // Get the resolved print path and length.
    while (1) {
        DWORD result = GetFullPathNameW(src_path, (DWORD)len, print_path,
                                        NULL);
        if (result == 0) {
            goto cleanup;
        }
        if (result < len) {
            // Include the null terminator in the length.
            len = result + 1;
            break;
        }
        LPWSTR temp = PyMem_RawRealloc(
                        (print_path == print_path_buf) ? NULL : print_path,
                        result * sizeof(WCHAR));
        if (temp == NULL) {
            SetLastError(ERROR_NOT_ENOUGH_MEMORY);
            goto cleanup;
        }
        print_path = temp;
        len = result;
    }

    // Contents of the reparse data buffer for a junction:
    //
    //   * The substitute name is the target path in the system.
    //     It must be a native NT path, e.g. prefixed with "\??\".
    //   * The print name is an optional target path to show in
    //     directory listings. It should be a DOS path.
    //   * Both names are stored sequentially in PathBuffer, and
    //     both must be null-terminated.
    //   * There are four fields that define the respective offset and
    //     length inside PathBuffer: SubstituteNameOffset,
    //     SubstituteNameLength, PrintNameOffset and PrintNameLength.
    //   * The total required allocation for the reparse data buffer
    //     is the sum of the following:
    //       - REPARSE_DATA_BUFFER_HEADER_SIZE
    //       - size of MountPointReparseBuffer, without PathBuffer
    //       - size of the substitute name in bytes
    //       - size of the print name in bytes

    rdb_size = _Py_REPARSE_DATA_BUFFER_HEADER_SIZE +
               sizeof(prdb->MountPointReparseBuffer) -
               sizeof(prdb->MountPointReparseBuffer.PathBuffer) +
               (prefix_maybe + len + len) * sizeof(WCHAR);
    if (rdb_size > _Py_MAXIMUM_REPARSE_DATA_BUFFER_SIZE) {
        SetLastError(ERROR_FILENAME_EXCED_RANGE);
        goto cleanup;
    }
    prdb = PyMem_RawCalloc(1, rdb_size);
    if (prdb == NULL) {
        SetLastError(ERROR_NOT_ENOUGH_MEMORY);
        goto cleanup;
    }

    prdb->ReparseTag = IO_REPARSE_TAG_MOUNT_POINT;
    prdb->ReparseDataLength =
        (USHORT)(rdb_size - _Py_REPARSE_DATA_BUFFER_HEADER_SIZE);
    prdb->MountPointReparseBuffer.SubstituteNameOffset = 0;
    prdb->MountPointReparseBuffer.SubstituteNameLength =
        (USHORT)((prefix_maybe + len - 1) * sizeof(WCHAR));
    prdb->MountPointReparseBuffer.PrintNameOffset =
        (USHORT)((prefix_maybe + len) * sizeof(WCHAR));
    prdb->MountPointReparseBuffer.PrintNameLength =
        (USHORT)((len - 1) * sizeof(WCHAR));

    // Store the subsitute path. At this point, it may have a WinAPI
    // "\\?\" or "\\.\" local device prefix.
    wcscpy(prdb->MountPointReparseBuffer.PathBuffer + prefix_maybe,
           print_path);
    // Store the NTAPI "\??\" local device prefix of the substitute path.
    wcsncpy(prdb->MountPointReparseBuffer.PathBuffer, L"\\??\\", 4);
    // Store the print path. It may have a WinAPI "\\?\" or "\\.\" local
    // device prefix.
    wcscpy(prdb->MountPointReparseBuffer.PathBuffer + prefix_maybe + len,
           print_path);

    // The target path must be a directory if it exists.
    DWORD attributes = GetFileAttributesW(print_path);
    if (attributes != INVALID_FILE_ATTRIBUTES &&
        (attributes & FILE_ATTRIBUTE_DIRECTORY) == 0) {
        SetLastError(ERROR_DIRECTORY);
        goto cleanup;
    }

    // The target path must be on an existing, local volume.
    len = Py_ARRAY_LENGTH(vol_path_buf);
    while (len <= UNICODE_STRING_MAX_CHARS) {
        BOOL ret = GetVolumePathNameW(print_path, vol_path, (DWORD)len);
        if (!ret && GetLastError() != ERROR_FILENAME_EXCED_RANGE) {
            goto cleanup;
        }
        if (ret && vol_path[wcslen(vol_path)-1] == L'\\') {
            break;
        }
        len = len * 2;
        LPWSTR temp = PyMem_RawRealloc(
                        (vol_path == vol_path_buf) ? NULL : vol_path,
                        len * sizeof(WCHAR));
        if (temp == NULL) {
            SetLastError(ERROR_NOT_ENOUGH_MEMORY);
            goto cleanup;
        }
        vol_path = temp;
    }
    switch (GetDriveTypeW(vol_path)) {
    case DRIVE_UNKNOWN:
    case DRIVE_NO_ROOT_DIR:
    case DRIVE_REMOTE:
        SetLastError(ERROR_INVALID_PARAMETER);
        goto cleanup;
    }

    // Create the directory, open it, and set the junction reparse point.

    if (!(created = CreateDirectoryW(dst_path, NULL))) {
        goto cleanup;
    }

    hdir = CreateFileW(dst_path, FILE_WRITE_DATA | SYNCHRONIZE, 0, NULL,
                OPEN_EXISTING, FILE_FLAG_OPEN_REPARSE_POINT |
                FILE_FLAG_BACKUP_SEMANTICS, NULL);
    if (hdir == INVALID_HANDLE_VALUE) {
        goto cleanup;
    }

    DWORD outbytes;
    if (!(success = DeviceIoControl(hdir, FSCTL_SET_REPARSE_POINT,
                        prdb, (DWORD)rdb_size, NULL, 0, &outbytes, NULL))) {
        goto cleanup;
    }

cleanup:

    DWORD error = GetLastError();

    if (print_path != print_path_buf) {
        PyMem_RawFree(print_path);
    }
    if (vol_path != vol_path_buf) {
        PyMem_RawFree(vol_path);
    }
    if (prdb != NULL) {
        PyMem_RawFree(prdb);
    }
    if (hdir != INVALID_HANDLE_VALUE) {
        CloseHandle(hdir);
    }

    if (!success) {
        if (created) {
            RemoveDirectoryW(dst_path);
        }
        return PyErr_SetFromWindowsErr(error);
    }

    Py_RETURN_NONE;
}
eryksun commented 1 year ago

Why not use mklink ? Is there a concern that mklink can be swapped out on the host machine?

For one, CMD doesn't support escaping "%" characters in a command line, and "%" is a valid filename character. At best, one can add a caret after each percent character, such as "%^username%^". On the first pass of the command line, CMD will try to expand an environment variable named "^username". Hopefully no such environment variable exists, but that's just hope. On its second pass, CMD consumes the carets, leaving unexpanded "%username%" in the command line.

mohamadArdebili commented 10 months ago

I'm using Windows, but instead of Scripts Folder, there's bin and the env won't be activated, I don't know why. Can you help?? @eryksun @zackees

zackees commented 10 months ago

No idea

brettcannon commented 9 months ago

I'm using Windows, but instead of Scripts Folder, there's bin and the env won't be activated, I don't know why.

The code for calculating the path can be found at https://github.com/python/cpython/blob/4230d7ce93cc25e9c5fb564a0b37e93f19ca0e4e/Lib/venv/__init__.py#L145 . Chances are you Python install is doing something odd.

zackees commented 9 months ago

Fyi in python 3.12 bin is now symlinked to scripts in python 3.12.