python / cpython

The Python programming language
https://www.python.org
Other
63.46k stars 30.39k forks source link

test_asyncio/test_logging freezing hard, making pgo-extended impossible #104567

Closed elandorr closed 1 year ago

elandorr commented 1 year ago

I see this has been an issue for at least several years e.g. https://github.com/python/cpython/issues/87197.

For at least the last 3, I've been building cpython on Debian stable with optims + LTO, and test_asyncio randomly locks up hard. Occasionally it will breeze through, though.

Debian stable 3.11.3 x64 extended pgo

I can't tell by a quick search where you're at right now regarding this, so feel free to close and link to the current thread, if there's one. Right now it hit me again and it's stuck ~2 hours. This eats more time than the full optimization run saves :).

Is there even a benefit to this test in particular, or would it make sense to just get rid of it altogether? What's the technical reason behind this randomly/almost always getting stuck? I read about a race condition while searching, but I suppose 3 years later that'd have been fixed.

Cheers

elandorr commented 1 year ago

I remember the last time working, but now with .3 it fails consistently: (tried thrice by now)

0:01:38 load avg: 2.27 [ 26/434] test_asyncio                                                   
Unknown child process pid 3442993, will report returncode 255                                                                                                                                   
Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 3442993 is closed
test test_asyncio failed                        
2:13:55 load avg: 1.07 [ 27/434] test_asyncore -- test_asyncio failed (6 errors) in 2 hour 12 min
2:13:55 load avg: 1.07 [ 28/434] test_atexit -- test_asyncore skipped                                                                                                                           
2:13:57 load avg: 1.07 [ 29/434] test_audioop                                  

Another test fails with a more severe error than usual now, maybe this is related?

2:25:02 load avg: 1.02 [152/434] test_ftplib                                                   
Warning -- Uncaught thread exception: OSError                                                  
Exception in thread Thread-512:                                                                
Traceback (most recent call last):                                                             
  File "/home/user/builds/Python-3.11.3-src/Lib/threading.py", line 1038, in _bootstrap_inn
er                                                                                             
    self.run()                                                                                 
  File "/home/user/builds/Python-3.11.3-src/Lib/test/test_ftplib.py", line 305, in run     
    asyncore.loop(timeout=0.1, count=1)                                                        
  File "/home/user/builds/Python-3.11.3-src/Lib/asyncore.py", line 212, in loop            
    poll_fun(timeout, map)                                                                     
  File "/home/user/builds/Python-3.11.3-src/Lib/asyncore.py", line 149, in poll
    r, w, e = select.select(r, w, e, timeout)                                                  
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                  
OSError: [Errno 9] Bad file descriptor                                                         
Fatal Python error: Segmentation fault      

After this failure, it stops continuing tests altogether, even with an infinite timeout set in the makefile.

I guess I have to stop using extended pgo. In the past it provided a small, but reliable performance boost, but now it seems it's completely broken.

elandorr commented 1 year ago

After running for some 12 odd hours by now, I had to kill it. It seems .3 has some regressions compared to .1, which ran reasonably fine (some tests have been failing for 20 years so I don't expect a 100% run).

The one that froze this time was test_logging. I see that has been an issue before: https://github.com/python/cpython/issues/73060 but the reason given there seems inapplicable. This is a stable Debian, up-to-date.

Something I noticed years ago is that some tests run fine individually, but fail miserably when run during make. They freeze everything for hours when they normally take a minute. test_signal was one such candidate.

I'm doing one more attempt as it has been random in the past, but it seems pgo-extended is just broken currently.

(I increased the timeout to 1d by the way. Original idea came from an article about optimizations, and I figured, I don't build every day, so might as well go hard.)

(And yes, I know about -x, but in my experience that only causes non-explainable trouble and bench differences.)

gvanrossum commented 1 year ago

I fear we don't speak the same language. I don't understand what your goal is, nor what you mean by ".1" and ".3".

The test_ftplib failure you mention has nothing to do with asyncio (asyncore is something else completely) so I am removing the asyncio label from this issue. Possibly your system is running out of resources.

elandorr commented 1 year ago

Hey @gvanrossum

I don't understand what your goal is

I figured that was obvious:

I've been building cpython on Debian stable with optims + LTO Debian stable 3.11.3 x64 extended pgo

Building cpython 3.11.3 with pgo-extended, on Debian stable, of course. What is unclear?

nor what you mean by ".1" and ".3".

3.11.1 3.11.3

Possibly your system is running out of resources.

Checked for that, it has plenty. Previous builds also worked fine and much bigger projects build.

The test_ftplib failure you mention has nothing to do with asyncio (asyncore is something else completely) so I am removing the asyncio label from this issue.

test_asyncio fails and freezes for 2 hours, so how exactly is this not related? test_ftplib also failing just came up in addition. And then test_logging which completely blocks.

Here's every failure and skip until the freeze:

0:01:36 load avg: 1.78 [ 26/434] test_asyncio                                
?test_asyncio                         
Unknown child process pid 1851226, will report returncode 255                                                                                             
Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 1851226 is closed                                                    
test test_asyncio failed              
2:13:49 load avg: 1.50 [ 27/434] test_asyncore -- test_asyncio failed (6 errors) in 2 hour 12 min                                                         
2:13:49 load avg: 1.50 [ 28/434] test_atexit -- test_asyncore skipped                                                       

2:15:25 load avg: 1.33 [ 58/434] test_cmd_line_script                        
/home/user/builds/Python-3.11.3-src/Lib/subprocess.py:849: RuntimeWarning: pass_fds overriding close_fds.                                             
  warnings.warn("pass_fds overriding close_fds.", RuntimeWarning)                                                                                         
2:15:32 load avg: 1.30 [ 59/434] test_code                                   

2:16:07 load avg: 1.47 [ 78/434] test_compile                                
<string>:10: SyntaxWarning: 'list' object is not callable; perhaps you missed a comma?                                                                    
2:16:24 load avg: 1.34 [ 79/434] test_compileall                       

2:21:35 load avg: 1.61 [ 96/434] test_curses
2:21:36 load avg: 1.61 [ 97/434] test_dataclasses -- test_curses skipped (resource denied)

2:22:39 load avg: 1.35 [109/434] test_devpoll
2:22:39 load avg: 1.35 [110/434] test_dict_version -- test_devpoll skipped

2:25:13 load avg: 1.67 [152/434] test_ftplib
Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 5)
Warning -- Dangling thread: <test.test_ftplib.DummyFTPServer ::1:0 at 0x7f057df35810>
Warning -- Dangling thread: <test.test_ftplib.DummyFTPServer ::1:0 at 0x7f057df34150>
Warning -- Dangling thread: <test.test_ftplib.DummyFTPServer ::1:0 at 0x7f057df35450>
Warning -- Dangling thread: <test.test_ftplib.DummyFTPServer ::1:0 at 0x7f057dca1490>
Warning -- Dangling thread: <_MainThread(MainThread, started 139661719827264)>
test test_ftplib failed
2:26:34 load avg: 1.15 [153/434] test_funcattrs -- test_ftplib failed (9 errors) in 1 min 20 sec

2:28:08 load avg: 1.57 [187/434] test_imaplib -- test_idle skipped
test test_imaplib failed
2:30:52 load avg: 1.10 [188/434] test_imghdr -- test_imaplib failed (1 failure) in 2 min 43 sec

2:33:24 load avg: 1.26 [207/434] test_kqueue
2:33:25 load avg: 1.24 [208/434] test_largefile -- test_kqueue skipped
2:34:09 load avg: 4.80 [209/434] test_launcher -- test_largefile passed in 44.1 sec
2:34:10 load avg: 4.50 [210/434] test_lib2to3 -- test_launcher skipped

test_logging is frozen now, and it's running for 6+ hours.

The tests have always been pretty buggy, but usually at the very least after 2h the build was done. I still use the .1 after all, which builds fine.

elandorr commented 1 year ago

For the frickin' OG Guido V this doesn't need to be said, but for anyone reading who might have never built like this:

After configuring, you can change the makefile:

-PROFILE_TASK=  -m test --pgo --timeout=$(TESTTIMEOUT)
+PROFILE_TASK=  -m test --pgo-extended --timeout=$(TESTTIMEOUT)

to run a bigger set of tests. As this takes much longer, it's prudent to increase the timeout. Most people never notice when default builds timeout eventually, as the result can still work. The default timeout is only 20mins so it's buried in a build log.

For the last few years this took ~2 hours and provided a small, but reliable performance boost.*

It always got stuck on a few notorious tests, like test_asyncio and test_signal. Usually after 2h they finished or failed and the majority succeeded, but now for the first time, it broke completely. First the asyncio one killed the build, after a few attempts that failed 'peacefully' but now test_logging is frozen.

6 hours 15 mins and counting, I doubt it will continue.

* I know it's not 'worth it', but people also use Gentoo and build FF/Chrome for 15 hours for a small boost.

gvanrossum commented 1 year ago

Hold your horses and watch your language. I was not aware of the "extended PGO" feature that you describe (haven't been in this part of the build process for ages) so thanks for explaining it to me. But I have more questions about it.

If test_asyncio runs for two hours, is that because some kind of instrumentation slows down the Python interpreter by more than an order of magnitude? When I run ./python.exe -m test test_asyncio it completes in around three minutes on an Intel Mac that's about 4 years old. (This is in a 3.12 checkout I happen to have handy, using the default configuration; I'd expect 3.11 to perform similarly.)

Whatever is slowing your test runs down, unfortunately many of the asyncio tests are written in a way that uses small sleeps, and I can totally see that they would flake out much more frequently if the code that is expected to run in those sleeps takes an order longer to run.

As a workaround, I recommend that you figure out how to exclude test_asyncio from the PGO workload -- we may eventually get to fixing it but don't hold your breath, there is a lot of code there and it serves our purpose (catching new bugs in asyncio during CI) well enough. (Sorry, I don't have the knowledge to help you figure out how to exclude a test, but I presume there is a way.)

CAM-Gerlach commented 1 year ago

As a workaround, I recommend that you figure out how to exclude test_asyncio from the PGO workload

I'm very much not a PGO expert, so take this with a grain of salt—and @elandorr , it's important to keep in mind that given the size of the Python project, any core dev from the oldest (Guido) to the newest (me) can't possibly be an expert at everything.

However, from listening to other devs that are, its apparently quite common to train PGO on the specific tests that best model the workloads that you'll be running Python under, and disable problematic ones. Reading the docs, it seems like the PROFILE_TASK env var will do exactly that; if you append -x test_asyncio to the default -m test --pgo --timeout=$(TESTTIMEOUT), it should exclude that test (and you can exclude other tests similarly). As you'd expect, python -m test --help will show the other options you can pass to include/exclude tests.

elandorr commented 1 year ago

Hold your horses and watch your language. I was not aware

'frickin' is not 'fuckin' although both are standard 'intensifiers'. OG = original. You're kind of like a childhood hero to me, it's cool you still interact with plain ol' tickets. I didn't want to lecture the godfather on something basic. Couldn't give a damn less about 'celebrities', but you're someone I indirectly deal with almost daily. Normally once they get rich they play stocks and forget all about what they used to do, so that's really nice. (I'm not a 'pro' by any means, though I grab python often enough to 'know' you which says something I guess.)

When I run ./python.exe -m test test_asyncio it completes in around three minutes on an Intel Mac that's about 4 years old.

That's expected. Typically the tests that fail work fine individually. That's what baffled me for the first time, because I assumed there'd be no difference. But there's more to it. (If anyone's qualified to guess why it behaves like that, it's you. Really needs a deep C guy presumably.)

You can also use -jn to parallelize them, but things get really weird. It's funny as this is a common gentoo build. Because I know how silly performance hunting often is, I used the standard pyperformance bench to get stable results. I found out, as soon as you start excluding or parallelizing, the results are all over the place. So much, that the effort becomes more a 'feel good thing' than a real benefit. A 'peaceful' failure resulted in more consistency in my experiments, than excluding the broken test(s).

Aka, either run in full, or don't bother with extended, as there likely won't be the benefit you expected. (But pyperformance takes a huge time to run, so I guess hardly anyone knows about it, except the core python guys who wrote it. I think I saw vstinner's name somewhere in there, but I don't want to ping randomly.)

(Sorry, I don't have the knowledge to help you figure out how to exclude a test, but I presume there is a way.)

It's nice that you admit to things even you don't know. That's easy: just add -x test_foo. Here's how people commonly handle it: https://github.com/InBetweenNames/gentooLTO/issues/552. (random search result, am not affiliated) 'just -x it'

Maybe it's time to stop performance hunting and just go with 'standard' optim flags. 3.11 is already faster out of the box than 3.10. (Although res usage increased a little.)

test_logging is still stuck. After almost 8 hours it's time to just build 'regular' and use it. There's just the little kid that still wants to push all the buttons and get the 'perfect' build, you know how that is.

A 'regular' build with optims/LTO, but only default PGO, just finished in 14 mins. All 44 tests passed.

I'll run the failed stuff one by one.

Maybe you can ping one of the build guys, so they know something changed.

@CAM-Gerlach I realize that python is pretty huge by now. I figured you routinely build your own and this is pretty mundane, but I don't expect even Guido to know 'everything'.

Conveying excitement via text is difficult nowadays, I guess. I meant like 'hey the man is still doing simple tickets, yeah'.

f you append -x test_asyncio to the default -m test --pgo --timeout=$(TESTTIMEOUT), it should exclude that test

Yes, I mentioned that already :)

(And yes, I know about -x, but in my experience that only causes non-explainable trouble and bench differences.)

to the newest (me)

Congrats!

Have a good one!

test_asyncio

Noticed:

Not all systems allow IPv6, so that should fail a bit faster, maybe. And not fail the entire test. In this case all IPv6 input is dropped at an early stage (preroute), but v6 is not disabled kernel-wide (because that bugs some programs). I just tested binding a local server to ::1, that works fine, but it can't be curled. Not sure how the netfilter handles locally originating traffic, but it seems this is related.

Also here it just failed after 22mins, which would still be more reasonable than the 2h and killing the entire build sometimes.

======================================================================                                                                                    
ERROR: test_create_datagram_endpoint_ipv6 (test.test_asyncio.test_events.EPollEventLoopTests.test_create_datagram_endpoint_ipv6)                          
----------------------------------------------------------------------                                                                                    
Traceback (most recent call last):                                                                                                                        
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1322, in test_create_datagram_endpoint_ipv6                   
    self._test_create_datagram_endpoint(('::1', 0), socket.AF_INET6)                                                                                      
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1301, in _test_create_datagram_endpoint                       
    test_utils.run_until(self.loop, lambda: server.nbytes)                                                                                                
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/utils.py", line 117, in run_until                                                   
    raise futures.TimeoutError()                                                                                                                          
          ^^^^^^^^^^^^^^^^^^^^                                                                                                                            
AttributeError: module 'asyncio.futures' has no attribute 'TimeoutError'                                                                                  

======================================================================                                                                                    
ERROR: test_create_server_dual_stack (test.test_asyncio.test_events.EPollEventLoopTests.test_create_server_dual_stack)                                    
----------------------------------------------------------------------                                                                                    
Traceback (most recent call last):                                                                                                                        
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1243, in test_create_server_dual_stack                        
    client.connect(('::1', port))                                                                                                                         
TimeoutError: [Errno 110] Connection timed out                                                                                                            

======================================================================                                                                                    
ERROR: test_create_datagram_endpoint_ipv6 (test.test_asyncio.test_events.PollEventLoopTests.test_create_datagram_endpoint_ipv6)                           
----------------------------------------------------------------------                                                                                    
Traceback (most recent call last):                                                                                                                        
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1322, in test_create_datagram_endpoint_ipv6                   
    self._test_create_datagram_endpoint(('::1', 0), socket.AF_INET6)                                                                                      
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1301, in _test_create_datagram_endpoint                       
    test_utils.run_until(self.loop, lambda: server.nbytes)                                                                                                
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/utils.py", line 117, in run_until                                                   
    raise futures.TimeoutError()                                                                                                                          
          ^^^^^^^^^^^^^^^^^^^^                                                                                                                            
AttributeError: module 'asyncio.futures' has no attribute 'TimeoutError'

======================================================================                                                                                    
ERROR: test_create_server_dual_stack (test.test_asyncio.test_events.PollEventLoopTests.test_create_server_dual_stack)                                     
----------------------------------------------------------------------                                                                                    
Traceback (most recent call last):
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1243, in test_create_server_dual_stack
    client.connect(('::1', port))
TimeoutError: [Errno 110] Connection timed out

======================================================================
ERROR: test_create_datagram_endpoint_ipv6 (test.test_asyncio.test_events.SelectEventLoopTests.test_create_datagram_endpoint_ipv6)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1322, in test_create_datagram_endpoint_ipv6
    self._test_create_datagram_endpoint(('::1', 0), socket.AF_INET6)
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1301, in _test_create_datagram_endpoint
    test_utils.run_until(self.loop, lambda: server.nbytes)
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/utils.py", line 117, in run_until
    raise futures.TimeoutError()
          ^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'asyncio.futures' has no attribute 'TimeoutError'

======================================================================
ERROR: test_create_server_dual_stack (test.test_asyncio.test_events.SelectEventLoopTests.test_create_server_dual_stack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_asyncio/test_events.py", line 1243, in test_create_server_dual_stack
    client.connect(('::1', port))
TimeoutError: [Errno 110] Connection timed out

----------------------------------------------------------------------
Ran 2353 tests in 1344.489s

FAILED (errors=6, skipped=48)

test test_asyncio failed
test_asyncio failed (6 errors) in 22 min 24 sec

== Tests result: FAILURE ==

1 test failed:
    test_asyncio

Total duration: 22 min 24 sec
Tests result: FAILURE

test_ftplib

IPv6 again. The system (and FW) didn't change since 3.9 or something, so no idea why that's suddenly an issue. At least it fails fast, if run standalone.

======================================================================                                   
ERROR: test_af (test.test_ftplib.TestIPv6Environment.test_af)                                            
----------------------------------------------------------------------                                   
Traceback (most recent call last):                                                                       
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_ftplib.py", line 869, in setUp             
    self.client.connect(self.server.host, self.server.port)                                              
  File "/home/user/user/Python-3.11.3-src/Lib/ftplib.py", line 158, in connect                     
    self.sock = socket.create_connection((self.host, self.port), self.timeout,                           
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 851, in create_connection           
    raise exceptions[0]                                                                                  
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 836, in create_connection           
    sock.connect(sa)                                                                                     
TimeoutError: timed out                                                                                  

======================================================================                                   
ERROR: test_makepasv (test.test_ftplib.TestIPv6Environment.test_makepasv)                                
----------------------------------------------------------------------                                   
Traceback (most recent call last):                                                                       
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_ftplib.py", line 869, in setUp             
    self.client.connect(self.server.host, self.server.port)                                              
  File "/home/user/user/Python-3.11.3-src/Lib/ftplib.py", line 158, in connect                     
    self.sock = socket.create_connection((self.host, self.port), self.timeout,                           
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 851, in create_connection           
    raise exceptions[0]                                                                                  
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 836, in create_connection           
    sock.connect(sa)                                                                                     
TimeoutError: timed out                                               

======================================================================                                   
ERROR: test_makeport (test.test_ftplib.TestIPv6Environment.test_makeport)                                
----------------------------------------------------------------------                                   
Traceback (most recent call last):                                                                       
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_ftplib.py", line 869, in setUp             
    self.client.connect(self.server.host, self.server.port)                                              
  File "/home/user/user/Python-3.11.3-src/Lib/ftplib.py", line 158, in connect                     
    self.sock = socket.create_connection((self.host, self.port), self.timeout,                           
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 851, in create_connection           
    raise exceptions[0]                                                                                  
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 836, in create_connection           
    sock.connect(sa)                                                                                     
TimeoutError: timed out                                                                                  

======================================================================                                   
ERROR: test_transfer (test.test_ftplib.TestIPv6Environment.test_transfer)                                
----------------------------------------------------------------------                                   
Traceback (most recent call last):                  
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_ftplib.py", line 869, in setUp
    self.client.connect(self.server.host, self.server.port)                                              
  File "/home/user/user/Python-3.11.3-src/Lib/ftplib.py", line 158, in connect
    self.sock = socket.create_connection((self.host, self.port), self.timeout,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 851, in create_connection
    raise exceptions[0]                             
  File "/home/user/user/Python-3.11.3-src/Lib/socket.py", line 836, in create_connection
    sock.connect(sa)                                
TimeoutError: timed out                             

======================================================================
ERROR: test_ccc (test.test_ftplib.TestTLS_FTPClass.test_ccc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/user/Python-3.11.3-src/Lib/test/test_ftplib.py", line 1008, in test_ccc
    self.client.login(secure=True)
  File "/home/user/user/Python-3.11.3-src/Lib/ftplib.py", line 745, in login
    self.auth()
  File "/home/user/user/Python-3.11.3-src/Lib/ftplib.py", line 756, in auth
    self.sock = self.context.wrap_socket(self.sock, server_hostname=self.host)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user/Python-3.11.3-src/Lib/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/user/Python-3.11.3-src/Lib/ssl.py", line 1075, in _create
    self.do_handshake()
  File "/home/user/user/Python-3.11.3-src/Lib/ssl.py", line 1346, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: _ssl.c:985: The handshake operation timed out

----------------------------------------------------------------------
Ran 94 tests in 34.055s

FAILED (errors=5, skipped=1)
Warning -- threading._dangling was modified by test_ftplib
Warning --   Before: {<weakref at 0x7f98a65cda30; to '_MainThread' at 0x7f98a69740d0>}
Warning --   After:  {<weakref at 0x7f98a6648950; to 'DummyFTPServer' at 0x7f98a65e8490>, <weakref at 0x7f98a6be0cc0; to 'DummyFTPServer' at 0x7f98a65c5090>, <weakref at 0x7f98a66a1080; to '_MainThread' at 0x7f98a69740d0>, <weakref at 0x7f98a673e200; to 'DummyFTPServer' at 0x7f98a6755ed0>, <weakref at 0x7f98a66a3920; to 'DummyFTPServer' at 0x7f98a65e81d0>} 
test test_ftplib failed
test_ftplib failed (5 errors) in 34.1 sec

== Tests result: FAILURE ==

1 test failed:
    test_ftplib

Total duration: 34.1 sec
Tests result: FAILURE

test_signal

This one is usually a huge cause of freezes, but runs just fine standalone. Seems to only skip the windows stuff and run fine otherwise.

Ran 56 tests in 47.151s

OK (skipped=4)
test_signal passed in 47.2 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 47.2 sec
Tests result: SUCCESS

test_logging

I don't recall this one ever being a problem, but now it is. It's the one that stops the build from continuing after 8h and even standalone it does nothing. Again the v6 theme. It gets completely frozen at this point:

test_name (test.test_logging.HandlerTest.test_name) ... ok
test_path_objects (test.test_logging.HandlerTest.test_path_objects)
Test that Path objects are accepted as filename arguments to handlers. ... ok
test_post_fork_child_no_deadlock (test.test_logging.HandlerTest.test_post_fork_child_no_deadlock)
Ensure child logging locks are not held; bpo-6721 & bpo-36533. ... ok
test_race (test.test_logging.HandlerTest.test_race) ... ok
test_output (test.test_logging.IPv6SysLogHandlerTest.test_output) ... 

To verify there's a difference, I also did 3.11.1 again. It also failed. The IPv6 drop netfilter didn't change since then. My best guess is an update changed how this is evaluated.

kernel 5.10.0-22 currently

So the usual 'test random bugginess' didn't change, it seems only IPv6 handling did. Maybe this is helpful to you as IPv6 is dropped in many places. Guess due to the 20min timeout really nobody notices, not even the gentoo crowd.

I'll use the opportunity to stop performance hunting. Unless you run heavy python 24/7 the gains are less than the CPU cost of doing this. Although if you fixed the couple broken tests and figured out why parallelization is causing chaos, the actual tests could run in like 30mins no problem even on this low-end box.

elandorr commented 1 year ago

'not planned'? You don't care about IPv6 failing?

gvanrossum commented 1 year ago

File a new bug for that specific thing.

elandorr commented 1 year ago

I'll just have to copy paste the same thing essentially, but okay.

gvanrossum commented 1 year ago

Don’t do that please. Leave the editorializing out and focus just on the IPv6 bug you have discovered.