Closed carltongibson closed 9 months ago
Based on C:\Users\carlt\src\django\django\db\backends\sqlite3\creation.py
I think it might be sqlite related 🤔
cc @erlend-aasland
I don't have access to a PC with Windows 11, so unfortunately I am unable debug this :( If someone with Windows 11 could do a bisect, that would be super helpful.
A lot of changes happened between 3.10 entered the beta phase and the first 3.11 alpha was released:
$ git log --oneline --until=2021-10-05 --since=2021-05-03 Modules/_sqlite | wc -l
56
If someone with Windows 11 could do a bisect, that would be super helpful.
Hi @erlend-aasland I've given this a try but I'm hitting various (A, for earlier commits) tool-chain and (B, for later commits) linker errors trying to build previous versions that (frankly) I'm not familiar enough with Windows, or building CPython for that matter, to be able to progress with in a reasonable timeframe.
I was able to reproduce the issue at 0474d06008, but I'm getting build errors by a3c11cebf1. A Windows expert would likely have more success.
(Sorry to not be of more help. Hopefully I'm a few steps along the road for the future.)
If someone with Windows 11 could do a bisect, that would be super helpful.
Hi @erlend-aasland I've given this a try but I'm hitting various (A, for earlier commits) tool-chain and (B, for later commits) linker errors trying to build previous versions that I'm not familiar enough with Windows, or building CPython for that matter, to be able to progress with in a reasonable timeframe.
I was able to reproduce the issue at 0474d06008, but I'm getting build errors by a3c11cebf1. A Windows expert would likely have more success.
I'm happy to keep trying, but I've reached a bit of a roadblock with building Python failing
(Sorry to not be of more help. Hopefully I'm a few steps along the road for the future.)
Can someone add OS-Windows
label? :window:
I'm not sure if it's related :thinking: but after forcing spawn
on Linux, I'm getting:
Exception ignored in: <function Pool.__del__ at 0x7f5548c9d580>
Traceback (most recent call last):
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 268, in __del__
ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.Pool state=RUN pool_size=8>
/usr/local/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Just checked on following PC specifications: Edition: Windows 11 Home Single Language Version: 21H2 OS build: 22000.1042
with Python 3.11.0rc2, Django-4.2.dev20221005120449
Output:
Ran 16021 tests in 74.720s
OK (skipped=1828, expected failures=5) Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'default'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'... Destroying test database for alias 'other'...`
*truncked some of .......sss........ output.
Let me know any other info/tests needed; I will be happy to help in the future regarding testing on Windows 11.
Couldn't replicate on auto (16 cores laptop).
python runtests.py --parallel
Bumping it up to ~40 started hitting the sharing violation on the removal of the 'other_1.sqlite3' (weirdly only the 1st i.e. other_1).
python runtests.py --parallel 40
So confirmed on latest 3.11.0rc2.
I've disabled Windows Defender on the folder under test (django) as well as %TEMP% to make sure Windows Defender it's not causing issues. That's due to fact that I've got quite a few issues with pickling (i.e. cleaning after that) e.g.
File "c:\users\<username>\AppData\Local\Programs\Python\Python311\Lib\shutil.py", line 623, in _rmtree_unsafe
os.rmdir(path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\<username>\\AppData\\Local\\Temp\\django_jpnyol3z\\django_16co2dfg\\tmpcaxkg2x2\\test_project'
Not 100% sure if that's a good troubleshooting approach, but a few observations: Pass that passes manages to call CloseFile from the first process, that is the process creating the other_1.sqlite3 and it's the same process that will try to delete the db file (clean-up). Crashing test run doesn't make such CloseFile call, last syscalls seen on system level are to unlock the file. In itself that shouldn't be a problem I guess (as it's the same PID trying to close / remove the file later on). However, there is quite a gap (up to 2sec on my box) between syscalls to unlock the file and close. Can't really figure it out yet (first time with Django test suit), but if you compare vs. a run on 3.10 there is a visible wait whilst: Creating test database for alias 'other'... Working theory - we are coming back on the same PID (process) to clean-up a file, that we are still keeping open (for close operation) elsewhere?
FTR, I'm in the process of installing a Windows 11 development environment now. Will hopefully be up and running by this evening, or at least tomorrow. I'll need to delve into how to debug CPython on Windows using Visual Studio, since I have little experience with both Windows and Visual Studio.
cc. release-manager @pablogsal
Couldn't replicate on auto (16 cores laptop).
python runtests.py --parallel
Ditto (Win 11, 16 cores)
Bumping it up to ~40 started hitting the sharing violation on the removal of the 'other_1.sqlite3' (weirdly only the 1st i.e. other_1).
python runtests.py --parallel 40
So confirmed on latest 3.11.0rc2.
Ditto. Bumped to 40, hit the PermissionError for other_1.sqlite3
.
Thanks so much for your input Eryk. AFAICS, this is not an sqlite3 bug; it is an issue with the test runner itself.
I'm for some reason unable to install Visual Studio 2015, so I'm unable to compile 3.10.0 beta1 at the moment. Thus, I'm once again unable to bisect this :(
This still looks to be failing with 3.11.0, so still looks like a regression from 3.10.
I was able to fix this by adding an explicit connection close()
(see https://github.com/django/django/pull/16225). A similar issue occurred in sqlite-utils
, see https://github.com/simonw/sqlite-utils/issues/503. I'm not sure why it started showing up in Python 3.11 but I don't mind closing the issue.
Thanks so much for your input Eryk. AFAICS, this is not an sqlite3 bug; it is an issue with the test runner itself.
Sorry, that comment wasn't meant for this issue. I cannot reproduce this issue to diagnose it. For me, there's a significant window of time between when the worker processes access the "*.sqlite3" files and when the main process deletes them at shutdown. If something else in the main process sometimes has the file open without delete sharing, I haven't been able to reproduce it in 3.11. (Note that any locking/unlocking of the database file that may be observed in monitoring tools has nothing to do with a sharing violation. Byte-range locks can lead to locking violations for read and write operations, but not to a sharing violation.)
That said, for projects that are using multiprocessing for parallel testing, I think it would help if multiprocessing introduced optional support for process groups and, in Windows, job objects, in order to do a better job ensuring that the entire process tree has been given a chance to gracefully exit, and subsequently to forcefully kill any remaining processes.
I'm not sure why it started showing up in Python 3.11
3.11 switched to using functools.lru_cache
for the statement cache. See #87028. This introduces a reference cycle involving the connection object, so the connection object no longer gets immediately deallocated, not until, for example, gc.collect()
is called to break the cycle.
I'm not sure why it started showing up in Python 3.11
3.11 switched to using
functools.lru_cache
for the statement cache. See #87028. This introduces a reference cycle involving the connection object, so the connection object no longer gets immediately deallocated, not until, for example,gc.collect()
is called to break the cycle.
Good point. I remember the Windows CI was unhappy in the first draft implementations of this enhancement. I'm not sure it is a bug that the connection object might live longer. Explicit resource control is advocated in the sqlite3 docs; closing connections explicitly is good practice.
From what I can tell, the old implementation avoids the circular reference by manually decrementing the reference count of the connection object:
I suppose you could have kept the statement cache type as a skeleton that uses an lru_cache
.
From what I can tell, the old implementation avoids the circular reference by manually decrementing the reference count of the connection object
True, it used that hack. Part of my motivation for getting rid of the (duplicate) LRU cache implementation in the _sqlite extension module was getting rid of this hack.
I suppose you could have kept the statement cache type as a skeleton that uses an lru_cache.
That would have been possible, but you'd keep the GC hack. I'm not sure that is a very good idea.
I was able to fix this by adding an explicit connection
close()
(see django/django#16225). A similar issue occurred insqlite-utils
, see simonw/sqlite-utils#503. I'm not sure why it started showing up in Python 3.11 but I don't mind closing the issue.
Following gh-108015, sqlite3 will now emit a ResourceWarning if connection objects are closed implicitly. Explicit resource management is recommended. cc. @eryksun
gh-108015 has now landed; Python 3.13 will emit a ResourceWarning if a connection object is closed implicitly. Explicit resource management is now recommended. @felixxm has reported that Django already started to clean up their test suite (django/django#17178).
It seems to me the problems mentioned in this issue can all be handled by improved resource management. Suggesting to close this.
Running the Django test suite against the Python 3.11 pre-releases, we have hit a potential regression.
Steps to reproduce
Bug report
On Python 3.8, 3.9, and 3.10 this runs without problem.
On Python 3.11 the following error in seen after the test suite completes, during shutdown:
This looked similar to us to open issue #95027 but we were asked to report it separately.
I've tested all the way back to
a1
where (along with other issues now resolved) this error still occurs:This is somewhat frustrating as we've tried to test on all platforms since the first releases. Our test suite would only run on Windows with Python 3.11 very recently as there was a third-party dependency that was not compatible with Windows. We will try to adjust to test without dependencies as well on Windows for future versions. (Sorry about that.)
Please do let us know if we can provide further info. I imagine though the easiest thing is for you to run this yourself.
Thanks.
//cc @felixxm
Your environment