Closed vdbergh closed 5 months ago
Tracking a changing ip address would be too cumbersome code wise (for a feature that is only marginally used). As the bug in the previous version of this PR likely has been fixed now, I would prefer to keep this PR as it is for now.
Afterwards we should change to restricting the number of active tasks per user, but this probably requires upping some limits, so it requires some thought.
yeah, no need for tracking changing IP addresses. In fact, we should eventually probably be oblivious to IP addresses because of this. But I think this PR is now working as advertised.
I have rebased the followup PR #2052 on top of this PR. If I am not mistaken then #2052 eliminates the last double loop in request_task()
,
current PR seems to already measurably reduce load, measurement at 65k cores.
failed_task 6.32
request_spsa 10.8759
request_task 80.462
request_version 0.238281
update_task 3.48939
upload_pgn 95.4392
PROD running with #2050 and #2052
prod timings at 100k+ cores
failed_task
request_spsa 6.16
request_task 11.8161
request_version 0.187047
update_task 3.66966
upload_pgn 91.7151
congrats @vdbergh
Unfortunately, we still lack a fleet able to test the limit of the VPS...
top - 19:01:27 up 175 days, 2:41, 3 users, load average: 3.92, 3.80, 2.90
Tasks: 41 total, 3 running, 38 sleeping, 0 stopped, 0 zombie
%Cpu0 : 80.1/8.0 88[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu1 : 76.4/6.3 83[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu2 : 59.9/6.6 67[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu3 : 78.4/5.6 84[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
KiB Mem : 78.2/5242880 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
KiB Swap: 0.0/0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 225000 5044 2904 S 0.1 2:08.20 init -z
2 root 20 S `- [kthreadd/1988]
3 root 20 S 0:00.18 `- [khelper]
74 root 20 577344 57064 45488 S 1.0 1.1 39:52.36 `- /lib/systemd/systemd-journald
195 root 20 42104 548 S 0.0 0:20.95 `- /lib/systemd/systemd-udevd
198 systemd+ 20 71716 592 S 0.0 0:24.79 `- /lib/systemd/systemd-networkd
206 syslog 20 189028 1888 124 S 0.0 6:00.97 `- /usr/sbin/rsyslogd -n
213 root 20 70956 2024 896 S 0.0 0:28.72 `- /lib/systemd/systemd-logind
215 message+ 20 47748 628 S 0.0 0:05.88 `- /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
327 root 20 186720 7980 S 0.2 0:00.03 `- /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
334 root 20 14660 152 S 0.0 0:00.01 `- /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
337 root 20 100980 676 S 0.0 0:00.04 `- /usr/sbin/saslauthd -a pam -c -m /var/run/saslauthd -n 2
341 root 20 100980 684 S 0.0 0:00.03 `- /usr/sbin/saslauthd -a pam -c -m /var/run/saslauthd -n 2
340 root 20 13016 144 S 0.0 0:00.01 `- /sbin/agetty -o -p -- \u --noclear tty2 linux
346 root 20 72292 836 76 S 0.0 2:46.66 `- /usr/sbin/sshd -D
12573 root 20 101548 2360 1392 S 0.0 0:00.05 `- sshd: fishtest [priv]
12597 fishtest 20 101548 1176 216 S 0.0 0:00.35 `- sshd: fishtest@pts/0
12598 fishtest 20 23152 4216 2500 S 0.1 0:00.28 `- -bash
18434 root 20 101548 3664 2700 S 0.1 0:00.01 `- sshd: fishtest [priv]
18445 fishtest 20 101548 1768 804 S 0.0 0:00.71 `- sshd: fishtest@pts/1
18446 fishtest 20 21276 3188 1684 S 0.1 0:00.04 `- -bash
18571 fishtest 20 38432 2684 2044 R 0.1 0:04.01 `- top
18577 root 20 101548 3764 2804 S 0.1 0:00.01 `- sshd: fishtest [priv]
18588 fishtest 20 101548 1892 932 S 0.0 0:00.01 `- sshd: fishtest@pts/2
18589 fishtest 20 21408 3496 1820 S 0.1 0:00.04 `- -bash
363 root 20 24180 252 S 0.0 0:00.01 `- /usr/sbin/xinetd -pidfile /run/xinetd.pid -stayalive -inetd_compat -inetd_ipv6
586 Debian-+ 20 59456 1128 380 S 0.0 0:46.54 `- /usr/sbin/exim4 -bd -q30m
6491 root 20 418872 321040 2308 S 6.1 0:37.22 `- nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
16944 www-data 20 418876 322808 4072 R 79.7 6.2 37:29.95 `- nginx: worker process
16945 www-data 20 418876 319724 992 S 6.1 0:00.09 `- nginx: cache manager process
20990 root 20 30020 844 556 S 0.0 0:01.09 `- /usr/sbin/cron -f
20710 root 20 53260 3168 2740 S 0.1 `- /usr/sbin/CRON -f
20711 fishtest 20 4624 816 748 S 0.0 `- /bin/sh -c /usr/bin/nice -n 10 /usr/bin/cpulimit -l 50 -f -m -- ${VENV}/bin/python3 ${UPATH}/delta_update_users.py
20712 fishtest 30 10 82516 940 856 S 0.3 0.0 0:00.08 `- /usr/bin/cpulimit -l 50 -f -m -- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/utils/delta_update_users.py
20713 fishtest 30 10 916432 238660 36688 R 47.2 4.6 0:12.12 `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/utils/delta_update_users.py
12575 fishtest 20 76392 2440 1584 S 0.0 0:00.02 `- /lib/systemd/systemd --user
12576 fishtest 20 254624 2272 S 0.0 `- (sd-pam)
16566 mongodb 20 5448836 1.817g 14424 S 27.6 36.3 14:03.60 `- /usr/bin/mongod --config /etc/mongod.conf
16914 fishtest 20 1606908 579144 7348 S 85.7 11.0 36:24.92 `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6543
16915 fishtest 20 1495764 591460 5860 S 79.7 11.3 12:21.65 `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6544
16916 fishtest 20 1270172 328864 7396 S 0.7 6.3 22:04.11 `- /home/fishtest/fishtest/server/env/bin/python3 /home/fishtest/fishtest/server/env/bin/pserve production.ini http_port=6545
Since #2052 is already being tested, we can close this.
This is a PR on top of #2049