Closed aeros closed 4 years ago
In several recent PRs, test_ctypes.test_load_dll_with_flags is failing for the Azure Pipelines "Windows PR tests win32" and "Windows PR tests win64" with the following error message:
======================================================================
ERROR: test_load_dll_with_flags (ctypes.test.test_loading.LoaderTest) [WinDLL('_sqlite3.dll', winmode=0)]
----------------------------------------------------------------------
Traceback (most recent call last):
File "d:\a\1\s\lib\ctypes\test\test_loading.py", line 140, in should_pass
subprocess.check_output(
File "d:\a\1\s\lib\subprocess.py", line 419, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "d:\a\1\s\lib\subprocess.py", line 533, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['d:\\a\\1\\s\\PCbuild\\win32\\python.exe', '-c', "from ctypes import *; import nt;WinDLL('_sqlite3.dll', winmode=0)"]' returned non-zero exit status 1.
======================================================================
ERROR: test_load_dll_with_flags (ctypes.test.test_loading.LoaderTest) [WinDLL('_sqlite3.dll', winmode=0)]
----------------------------------------------------------------------
Traceback (most recent call last):
File "d:\a\1\s\lib\ctypes\test\test_loading.py", line 140, in should_pass
subprocess.check_output(
File "d:\a\1\s\lib\subprocess.py", line 419, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "d:\a\1\s\lib\subprocess.py", line 533, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['d:\\a\\1\\s\\PCbuild\\win32\\python.exe', '-c', "from ctypes import *; import nt;WinDLL('_sqlite3.dll', winmode=0)"]' returned non-zero exit status 1.
Since this only started occurring recently in several unrelated PRs, I suspect it was most likely an intermittent regression introduced in master. Here are the PRs I have seen the same exact error occur in so far:
https://github.com/python/cpython/pull/18239 https://github.com/python/cpython/pull/19403 https://github.com/python/cpython/pull/19402 https://github.com/python/cpython/pull/19399
I was unable to reproduce it locally on my secondary boot of Windows 10.
My best guess at the moment is that something changed in the underlying Windows image, particularly given the comment above the failing line that "insecure load flags should succeed" (though that could easily be a red herring). Until someone who actually does understand what's going on here (Steve? :)) can deal with it properly, I have a PR to disable the problematic part of the test that I'll merge shortly.
New changeset f407e209c1e35b64835f73e7e7ca23e33817e9fe by Zachary Ware in branch 'master': bpo-40214: Temporarily disable a ctypes test (GH-19404) https://github.com/python/cpython/commit/f407e209c1e35b64835f73e7e7ca23e33817e9fe
Zachary Ware wrote:
My best guess at the moment is that something changed in the underlying Windows image, particularly given the comment above the failing line that "insecure load flags should succeed" (though that could easily be a red herring).
Also, now that I think about it, the the test failure has only specifically occurred on Azure Pipelines and not on the GitHub Actions Windows PR tests (as far as I have seen), which does further increase the likelihood that it's an issue with the Windows image being used in Azure for the PR tests.
Regardless of the cause though, thanks for applying a temporary fix/skip in the meantime, these types of issues can be troublesome for CPython's workflow when it occurs on required CI checks. Hopefully Steve Dower or someone similarly knowledgeable in the area can definitively figure out the cause.
Maybe something else was installed that put an incompatible _sqlite3.dll on PATH? Thereby proving the inherent risk of using unsafe DLL load settings :)
If we're not already overriding PATH for the subprocess in this test (should only need "%SystemRoot%;%SystemRoot%\System32"), we may have to. Though if someone is installing an incompatible _sqlite3.dll into a system directory then we're in a bit of trouble.
Disabling this part of the test as inherently untestable because it's an inherently unsafe operation is fine by me :)
Maybe something else was installed that put an incompatible _sqlite3.dll on PATH? Thereby proving the inherent risk of using unsafe DLL load settings
should_pass() sets the working directory of the test process to the tmp
directory. The loader normally checks the working directory before PATH. But maybe the system itself is configured to disallow loading DLLs from the working directory. There's a registry setting for that, but it's little known and rarely used because it's disruptive in general.
Assuming I implemented my checks correctly (see PR19413), I think I've just debunked both of our leading theories.
It's probably the sqlite3.dll dependency that's failing, not the _sqlite3.dll (originally _sqlite3.pyd) one.
The test is verifying that dependent DLLs are loaded correctly. So I assume in this case the _sqlite3.dll is being loaded, but it's finding the wrong sqlite3.dll.
Since I could, I updated Zach's PR to check for other sqlite3.dll files. The new build should be at https://dev.azure.com/Python/cpython/_build/results?buildId=60785&view=logs&j=c83831cd-3752-5cc7-2f01-8276919eb334 in a few minutes.
It might also be a "sqlite3_d.dll". Updated build is https://dev.azure.com/Python/cpython/_build/results?buildId=60787&view=results
Found it (and it's kind-of us):
Checking C:\Program Files\Amazon\AWSCLIV2\ Found at C:\Program Files\Amazon\AWSCLIV2\sqlite3.dll Found at C:\Program Files\Amazon\AWSCLIV2\_sqlite3.pyd
But I'm not sure why that is getting loaded earlier than the current directory. Is that the behaviour we went for here?
(FWIW, we build and test a release build, not a debug build, which is why we're not looking for sqlite3_d.dll... but perhaps we should be using a debug build? That might be slower, but the extra validation is probably worthwhile...)
Steve Dower wrote:
(FWIW, we build and test a release build, not a debug build, which is why we're not looking for sqlite3_d.dll... but perhaps we should be using a debug build? That might be slower, but the extra validation is probably worthwhile...)
In general, I think a debug build instead of a release would make more sense for the PR tests. Do you have a general estimate or rough idea as to how much slower it would be in comparison?
Is this going to be backported? It seems backports also use the same build and have this error. Sample 3.8 backport build that seems to be related to this issue : https://dev.azure.com/Python/cpython/_build/results?buildId=60753&view=logs&j=d554cd63-f8f4-5b2d-871b-33e4ea76e915&t=5a14d0eb-dbd4-5b80-f5d0-7909f950a1cc&l=1570
Feel free to backport PR19404 as needed, but mark versions here appropriately to make sure the *real* fix makes it where it needs to go.
Do you have a general estimate or rough idea as to how much slower it would be in comparison?
It's one sample point, but compare https://buildbot.python.org/all/#/builders/129/builds/708 to https://github.com/python/cpython/runs/571497886
Compile time: 3:32 (release) -> 1:10 (debug) Test time: 12:28 (release) -> 15:31 (debug)
Though the test timing vary wildly, as some tests cause more contention than others and they run in a random order.
Steve Dower wrote:
It's one sample point, but compare https://buildbot.python.org/all/#/builders/129/builds/708 to https://github.com/python/cpython/runs/571497886
FWIW, I'd be +1 in favor for using the debug build then. A few additional minutes would be well worth having more thorough PR tests IMO. If it becomes an issue, we can always revert it back, but it seems that there's little to no harm in trying it out.
Python 3.8 is also affected. Example: https://github.com/python/cpython/pull/19444#issuecomment-611499825
I backported the change to skip the test until it's fixed: PR 19449.
New changeset c83f003ee5398e9c27a0c634329420003d607d46 by Miss Islington (bot) in branch '3.8': bpo-40214: Temporarily disable a ctypes test (GH-19404) https://github.com/python/cpython/commit/c83f003ee5398e9c27a0c634329420003d607d46
But I'm not sure why that is getting loaded earlier than the current directory. Is that the behaviour we went for here?
I don't understand what's going on here if %PATH% is interfering. The current directory (%CD%) should be checked before %PATH% with the standard DLL search order for desktop applcations. That's what I've observed in practice and how it's documented 1:
SafeDllSearchMode enabled:
1. %__APPDIR__%
2. %SystemRoot%\System32
3. %SystemRoot%\System
4. %SystemRoot%
5. %__CD__%
6. %PATH%
SafeDllSearchMode disabled:
1. %__APPDIR__%
2. %__CD__%
3. %SystemRoot%\System32
4. %SystemRoot%\System
5. %SystemRoot%
6. %PATH%
I don't understand what's going on here if %PATH% is interfering.
Yeah, as shown by the fact my fix didn't fix it :)
I'm going to try adding some more diagnostics around this to find out exactly where it's failing. I suspect it's a dependency, rather than the direct file.
So I collected more info, and it seems the search order we're getting for _sqlite3.dll (.pyd) is:
And then we never look in CWD. I wonder if it's being excluded because it's under %TEMP%?
I can't reproduce it either with a long PATH or with it in TEMP, so there may be a policy setting involved. Trying again with the test not using TEMP (it shouldn't be using the real TEMP anyway...)
Okay, I've wasted enough time on this. Let's just disable the test permanently and say that the winmode=0 behaviour is system-specific and is not tested.
it seems the search order we're getting for _sqlite3.dll (.pyd) is:
- app directory
- SQL server install directory (?)
- Windows/System32
- Windows
- %PATH% (which is truncated at 2190 chars)
Thank you. It finally makes sense. The parent process is calling SetDllDirectoryW 1, which also replaces the current working directory in the DLL search path of child processes (inherited via the PEB ProcessParameters->DllPath) 2:
Note: The standard search order of the process will also be
affected by calling the SetDllDirectory function in the parent
process before start of the current process.
In that case, all we have to do is restore the original search path by calling SetDllDirectoryW(NULL). For example:
import ctypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
if not kernel32.SetDllDirectoryW(None):
raise ctypes.WinError(ctypes.get_last_error())
You mean the CI agent is doing it? Because there's nowhere in Python that should be doing it.
I'd rather not blanket change this option in the test suite (except to turn it on :) ), so that code snippet probably needs to shrink down into a one-liner that can go with the "-c" just for these tests.
You mean the CI agent is doing it? Because there's nowhere in Python that should be doing it.
The ProcessParameters->DllPath value is inherited until a process in the tree reverts to the original search path via SetDllDirectoryW(NULL), so it can be any ancestor process, not just the immediate parent process.
I'd rather not blanket change this option in the test suite (except to turn it on :) ), so that code snippet probably needs to shrink down into a one-liner that can go with the "-c" just for these tests.
The test could call GetDllDirectoryW 1 to save and restore the previous value. Unfortunately, GetDllDirectoryW returns 0 for both the default case (NULL) and when the working directory is disabled by setting an empty string (""). It would have to assume one or the other.
Otherwise if you just want a one-liner: "import ctypes; ctypes.windll.kernel32.SetDllDirectoryW(None)".
New changeset 9b498939009f49b8c772c89e8fc80efbfd8afcb5 by Steve Dower in branch 'master': bpo-40214: Fix ctypes WinDLL test with insecure flags (GH-19652) https://github.com/python/cpython/commit/9b498939009f49b8c772c89e8fc80efbfd8afcb5
Looks like that fix has done it. Thanks, Eryk!
New changeset 0cc7becde0bfe896fd23b5cb14fedfb8f2066fca by Miss Islington (bot) in branch '3.8': bpo-40214: Fix ctypes WinDLL test with insecure flags (GH-19652) https://github.com/python/cpython/commit/0cc7becde0bfe896fd23b5cb14fedfb8f2066fca
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-bug', '3.8', '3.9', 'ctypes', 'tests', 'OS-windows']
title = 'test_ctypes.test_load_dll_with_flags Windows failure'
updated_at =
user = 'https://github.com/aeros'
```
bugs.python.org fields:
```python
activity =
actor = 'miss-islington'
assignee = 'none'
closed = True
closed_date =
closer = 'steve.dower'
components = ['Tests', 'Windows', 'ctypes']
creation =
creator = 'aeros'
dependencies = []
files = []
hgrepos = []
issue_num = 40214
keywords = ['patch']
message_count = 27.0
messages = ['365887', '365890', '365892', '365900', '365902', '365910', '365919', '365953', '365954', '365955', '365970', '365971', '365993', '365999', '366003', '366044', '366045', '366307', '366316', '366319', '366320', '366322', '366324', '366327', '367013', '367014', '369661']
nosy_count = 8.0
nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'miss-islington', 'xtreak', 'aeros']
pr_nums = ['19404', '19413', '19449', '19500', '19652', '19656']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue40214'
versions = ['Python 3.8', 'Python 3.9']
```