official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
281 stars 129 forks source link

Dynamic updates of connections (to respect connection limit). #2050

Closed vdbergh closed 5 months ago

vdbergh commented 5 months ago

This is a PR on top of #2049

vdbergh commented 5 months ago

Tracking a changing ip address would be too cumbersome code wise (for a feature that is only marginally used). As the bug in the previous version of this PR likely has been fixed now, I would prefer to keep this PR as it is for now.

Afterwards we should change to restricting the number of active tasks per user, but this probably requires upping some limits, so it requires some thought.

vondele commented 5 months ago

yeah, no need for tracking changing IP addresses. In fact, we should eventually probably be oblivious to IP addresses because of this. But I think this PR is now working as advertised.

vdbergh commented 5 months ago

I have rebased the followup PR #2052 on top of this PR. If I am not mistaken then #2052 eliminates the last double loop in request_task(),

vondele commented 5 months ago

current PR seems to already measurably reduce load, measurement at 65k cores.

failed_task 6.32
request_spsa 10.8759
request_task 80.462
request_version 0.238281
update_task 3.48939
upload_pgn 95.4392
ppigazzini commented 5 months ago

PROD running with #2050 and #2052

vondele commented 5 months ago

prod timings at 100k+ cores

failed_task
request_spsa 6.16
request_task 11.8161
request_version 0.187047
update_task 3.66966
upload_pgn 91.7151

congrats @vdbergh

ppigazzini commented 5 months ago

Unfortunately, we still lack a fleet able to test the limit of the VPS... image

top - 19:01:27 up 175 days,  2:41,  3 users,  load average: 3.92, 3.80, 2.90
Tasks:  41 total,   3 running,  38 sleeping,   0 stopped,   0 zombie
%Cpu0  :  80.1/8.0    88[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||            ]
%Cpu1  :  76.4/6.3    83[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                  ]
%Cpu2  :  59.9/6.6    67[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                 ]
%Cpu3  :  78.4/5.6    84[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                ]
KiB Mem : 78.2/5242880  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                      ]
KiB Swap:  0.0/0        [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20      225000   5044   2904 S        0.1   2:08.20 init -z
    2 root      20                           S                       `- [kthreadd/1988]
    3 root      20                           S              0:00.18      `- [khelper]
   74 root      20      577344  57064  45488 S   1.0  1.1  39:52.36  `- /lib/systemd/systemd-journald
  195 root      20       42104    548        S        0.0   0:20.95  `- /lib/systemd/systemd-udevd
  198 systemd+  20       71716    592        S        0.0   0:24.79  `- /lib/systemd/systemd-networkd
  206 syslog    20      189028   1888    124 S        0.0   6:00.97  `- /usr/sbin/rsyslogd -n
  213 root      20       70956   2024    896 S        0.0   0:28.72  `- /lib/systemd/systemd-logind
  215 message+  20       47748    628        S        0.0   0:05.88  `- /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
  327 root      20      186720   7980        S        0.2   0:00.03  `- /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
  334 root      20       14660    152        S        0.0   0:00.01  `- /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
  337 root      20      100980    676        S        0.0   0:00.04  `- /usr/sbin/saslauthd -a pam -c -m /var/run/saslauthd -n 2
  341 root      20      100980    684        S        0.0   0:00.03      `- /usr/sbin/saslauthd -a pam -c -m /var/run/saslauthd -n 2
  340 root      20       13016    144        S        0.0   0:00.01  `- /sbin/agetty -o -p -- \u --noclear tty2 linux
  346 root      20       72292    836     76 S        0.0   2:46.66  `- /usr/sbin/sshd -D
12573 root      20      101548   2360   1392 S        0.0   0:00.05      `- sshd: fishtest [priv]
12597 fishtest  20      101548   1176    216 S        0.0   0:00.35          `- sshd: fishtest@pts/0
12598 fishtest  20       23152   4216   2500 S        0.1   0:00.28              `- -bash
18434 root      20      101548   3664   2700 S        0.1   0:00.01      `- sshd: fishtest [priv]
18445 fishtest  20      101548   1768    804 S        0.0   0:00.71          `- sshd: fishtest@pts/1
18446 fishtest  20       21276   3188   1684 S        0.1   0:00.04              `- -bash
18571 fishtest  20       38432   2684   2044 R        0.1   0:04.01                  `- top
18577 root      20      101548   3764   2804 S        0.1   0:00.01      `- sshd: fishtest [priv]
18588 fishtest  20      101548   1892    932 S        0.0   0:00.01          `- sshd: fishtest@pts/2
18589 fishtest  20       21408   3496   1820 S        0.1   0:00.04              `- -bash
  363 root      20       24180    252        S        0.0   0:00.01  `- /usr/sbin/xinetd -pidfile /run/xinetd.pid -stayalive -inetd_compat -inetd_ipv6
  586 Debian-+  20       59456   1128    380 S        0.0   0:46.54  `- /usr/sbin/exim4 -bd -q30m
 6491 root      20      418872 321040   2308 S        6.1   0:37.22  `- nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
16944 www-data  20      418876 322808   4072 R  79.7  6.2  37:29.95      `- nginx: worker process
16945 www-data  20      418876 319724    992 S        6.1   0:00.09      `- nginx: cache manager process
20990 root      20       30020    844    556 S        0.0   0:01.09  `- /usr/sbin/cron -f
20710 root      20       53260   3168   2740 S        0.1                `- /usr/sbin/CRON -f
20711 fishtest  20        4624    816    748 S        0.0                    `- /bin/sh -c /usr/bin/nice -n 10 /usr/bin/cpulimit -l 50 -f -m -- ${VENV}/bin/python3 ${UPATH}/delta_update_users.py
20712 fishtest  30  10   82516    940    856 S   0.3  0.0   0:00.08              `- /usr/bin/cpulimit -l 50 -f -m -- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/utils/delta_update_users.py
20713 fishtest  30  10  916432 238660  36688 R  47.2  4.6   0:12.12                  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/utils/delta_update_users.py
12575 fishtest  20       76392   2440   1584 S        0.0   0:00.02  `- /lib/systemd/systemd --user
12576 fishtest  20      254624   2272        S        0.0                `- (sd-pam)
16566 mongodb   20     5448836 1.817g  14424 S  27.6 36.3  14:03.60  `- /usr/bin/mongod --config /etc/mongod.conf
16914 fishtest  20     1606908 579144   7348 S  85.7 11.0  36:24.92  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6543
16915 fishtest  20     1495764 591460   5860 S  79.7 11.3  12:21.65  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6544
16916 fishtest  20     1270172 328864   7396 S   0.7  6.3  22:04.11  `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6545
vdbergh commented 5 months ago

Since #2052 is already being tested, we can close this.