python / buildmaster-config

Configuration for buildbot.python.org
https://buildbot.python.org/
MIT License
33 stars 56 forks source link

Reduce refleak timeout from 3h20 to 45min #543

Closed vstinner closed 3 weeks ago

vstinner commented 3 weeks ago

AMD64 Fedora Stable Refleaks 3.13 slowest test was 9 minutes, whereas a test was blocked for 3 hour 20 min.

cc @zware

vstinner commented 3 weeks ago
encukou commented 2 weeks ago

There are slower builders on 3.10 which has monolithic test_asyncio. AMD64 Fedora Stable Refleaks 3.10 and aarch64 RHEL8 Refleaks 3.10 used to take closer to an hour, and are failing now. I expect 3.9 builders to start failing when they first build with this config.

Is there any extra background info for this change? Would it be better to revert it, or special-case the branches, like this for example?

diff --git a/master/custom/factories.py b/master/custom/factories.py
index d4e703e..23dd122 100644
--- a/master/custom/factories.py
+++ b/master/custom/factories.py
@@ -72,6 +72,11 @@ class UnixBuild(BaseBuild):
         # Adjust the timeout for this worker
         self.test_timeout *= kwargs.get("timeout_factor", 1)

+        # Before 3.11, test_asyncio wasn't split up, so refleaks tests
+        # need more time.
+        if branch in ("3.9", "3.10") and has_option("-R", self.testFlags):
+            self.test_timeout *= 2
+
         if self.build_out_of_tree:
             self.addStep(
                 ShellCommand(
vstinner commented 2 weeks ago

Is there any extra background info for this change?

When multiprocessing or concurrent.futures or another test hangs, we wasted 3h in waiting, whereas slowest test takes 18 min. Then buildbot overall timeout of 4h killed the re-run job.

zware commented 2 weeks ago

I'm in favor of either that kind of special case, or just dealing with it on the rare occasion that the 3.10 builders run. I definitely do not want to revert :)

encukou commented 2 weeks ago

I'd like to schedule some extra runs on those builders from time to time. Normal runs are meant for security fixes, and that's a bad time to discover latent issues.

I filed the patch as https://github.com/python/buildmaster-config/pull/545