sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
724 stars 1.38k forks source link

python daemons in bookworm are consuming more memory than in bullseye #19828

Open vivekrnv opened 1 month ago

vivekrnv commented 1 month ago

Description

python daemons in bookworm is consuming more memory than in bullseye.

Some daemon like xcvrd, thermatctld, featured, hostcfgd, healthd etc has RSS increased by around ~ 20MB. Even smaller daemons has size increased by "container wait", supervisor-proc-exit-listener by 7 MB.

Combined with new daemons such stormon, dhcprelayd the overall mem usage will be higher by atleast ~500 MB. On a system with 8GB RAM this would account to almost 5-6 % more usage. Thus the test_cpu_memory_usage.py test should be updated to handle the increase.

This is happening because of speed optimization made for python 3.11 https://devclass.com/2022/05/31/how-python-3-11-is-gaining-performance-at-the-cost-of-a-bit-more-memory/

and seems this optimization has some memory implications https://peps.python.org/pep-0659/#memory-use

Steps to reproduce the issue:

  1. Run the test_cpu_memory_usage on 202405/master images on devices with RAM <= 8 GB and it will fail https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/test_cpu_memory_usage.py#L54

  2. Or just load the image and check memory usage after some time

Describe the results you received:

On 202311:

root@msn3420-r0:/home/admin# ps aux --sort=-rss | grep python
root        9357  0.6  2.2 369652 179912 pts/0   Sl   00:28   0:13 python3 /usr/local/bin/xcvrd --skip_cmis_mgr
root        9557  0.2  2.1 235844 171168 pts/0   S    00:28   0:04 python3 /usr/local/bin/thermalctld
root        7157  0.1  0.6 224944 51680 ?        Ss   00:28   0:03 /usr/bin/python3 /usr/local/bin/healthd
root        9007  0.0  0.5 292792 45248 ?        Sl   00:28   0:00 python3 /usr/bin/docker-wait-any -s swss -d syncd teamd
admin      11360  0.0  0.5  68556 43408 ?        S    00:29   0:00 python3 /usr/local/bin/container wait gnmi
root        8988  0.0  0.5  68640 43320 ?        S    00:28   0:00 python3 /usr/local/bin/container wait dhcp_relay
admin       8834  0.0  0.5  68556 43316 ?        S    00:28   0:00 python3 /usr/local/bin/container wait radv
root       12061  0.0  0.5  68640 43316 ?        S    00:30   0:00 python3 /usr/local/bin/container wait snmp
root        4927  0.0  0.5  68636 43312 ?        S    00:28   0:00 python3 /usr/local/bin/container wait eventd
admin       8937  0.0  0.5  68556 43240 ?        S    00:28   0:00 python3 /usr/local/bin/container wait pmon
admin      11619  0.0  0.5  68556 43220 ?        S    00:29   0:00 python3 /usr/local/bin/container wait mgmt-framework
admin       6076  0.0  0.5  68556 43156 ?        S    00:28   0:00 python3 /usr/local/bin/container wait teamd
root        8942  0.0  0.5  68556 43140 ?        S    00:28   0:00 python3 /usr/local/bin/container wait syncd
admin       6857  0.0  0.5  68556 42968 ?        S    00:28   0:00 python3 /usr/local/bin/container wait bgp
admin      11490  0.0  0.5  68556 42908 ?        S    00:29   0:00 python3 /usr/local/bin/container wait lldp
root        7210  0.0  0.5 168928 42000 ?        S    00:28   0:00 /usr/bin/python3 /usr/local/bin/healthd
root        7207  0.0  0.5 162476 41480 ?        S    00:28   0:01 /usr/bin/python3 /usr/local/bin/healthd
root       12360  9.5  0.4 132148 38860 pts/0    Sl   00:30   3:00 python3 -m sonic_ax_impl
root        7202  0.0  0.4 374760 37920 ?        Sl   00:28   0:00 /usr/bin/python3 /usr/local/bin/healthd
root       10181  0.0  0.4  57312 37768 ?        Ss   00:29   0:00 /usr/bin/python3 /usr/local/bin/hostcfgd
root        7213  0.0  0.4 157992 35996 ?        S    00:28   0:00 /usr/bin/python3 /usr/local/bin/healthd
root        9365  0.0  0.4  51404 34096 pts/0    S    00:28   0:00 python3 /usr/local/bin/syseepromd
root       10180  0.0  0.3  49616 30444 ?        Ss   00:29   0:00 /usr/bin/python3 /usr/local/bin/featured
root        7506  0.0  0.3 120208 29380 pts/0    Sl   00:28   0:00 /usr/bin/python3 /usr/local/bin/bgpcfgd
root        9370  0.0  0.3  46944 29136 pts/0    S    00:28   0:00 python3 /usr/local/bin/thermalctld
root        4146  0.0  0.3  47052 27412 ?        Ss   00:28   0:00 /usr/bin/python3 /usr/local/bin/caclmgrd
root       12202  0.0  0.3 127304 27120 pts/0    Sl   00:30   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name snmp
root        9371  0.1  0.3  44204 26984 pts/0    S    00:28   0:03 /usr/bin/python3 /usr/local/bin/pcied
root        9278  0.0  0.3 124044 26812 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name pmon
root        8925  0.0  0.3 124048 26804 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name syncd
root        9144  0.0  0.3 124020 26752 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name radv
root        9360  0.3  0.3  44372 26752 pts/0    S    00:28   0:07 python3 /usr/local/bin/psud
root        9148  0.0  0.3 124048 26684 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name dhcp_relay
root        6735  0.0  0.3 124048 26668 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name teamd
root       11474  0.0  0.3 124016 26600 pts/0    Sl   00:29   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name gnmi
root        5527  0.0  0.3 124016 26524 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name eventd
root        7398  0.0  0.3 124044 26492 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name bgp
root        6162  0.0  0.3 124044 26428 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name swss
root        1654  0.0  0.3 124020 26392 pts/0    Sl   00:28   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name database
root       11674  0.0  0.3 124044 26372 pts/0    Sl   00:29   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name lldp
root        4963  0.1  0.3  30120 25672 pts/0    Ss+  00:28   0:03 /usr/bin/python3 /usr/local/bin/supervisord
root       12043  0.0  0.3  32860 25564 pts/0    Ss+  00:30   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        8797  0.0  0.3  29756 25424 pts/0    Ss+  00:28   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root       11467  0.0  0.3  29600 25392 pts/0    Ss+  00:29   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        6714  0.1  0.3  29632 25328 pts/0    Ss+  00:28   0:02 /usr/bin/python3 /usr/local/bin/supervisord
root        5966  0.0  0.3  29596 25288 pts/0    Ss+  00:28   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        4160  0.1  0.3  45048 25268 ?        Ss   00:28   0:02 /usr/bin/python3 /usr/local/bin/procdockerstatsd
root       11343  0.0  0.3  29596 25244 pts/0    Ss+  00:29   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        8670  0.0  0.3  29596 25236 pts/0    Ss+  00:28   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        8808  0.0  0.3  29596 25212 pts/0    Ss+  00:28   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        4824  0.0  0.3  29596 25184 pts/0    Ss+  00:28   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        8157  0.0  0.3  29596 25168 pts/0    Ss+  00:28   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        1397  0.0  0.3  29596 25140 pts/0    Ss+  00:28   0:00 /usr/bin/python3 /usr/local/bin/supervisord
root       11750  0.0  0.3  29224 24316 pts/0    Ss+  00:29   0:00 /usr/bin/python3 /usr/local/bin/supervisord
root       12189  0.0  0.3  29216 24312 pts/0    Ss+  00:30   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root       12286  0.0  0.2  41204 23976 pts/0    S    00:30   0:00 python3 /usr/bin/lldpmgrd
root       12251  0.6  0.2 112716 23856 pts/0    Sl   00:30   0:11 python3 -m lldp_syncd
root        9276  0.0  0.2  37848 22428 pts/0    S    00:28   0:00 /usr/bin/python3 /usr/local/bin/dhcprelayd
root       11601  0.0  0.2  28980 22332 pts/0    Ss+  00:29   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        7156  0.8  0.2 248276 20964 ?        Sl   00:28   0:17 /usr/bin/python /usr/bin/hw_management_thermal_control.py
root        7507  0.0  0.2  36056 20652 pts/0    S    00:28   0:00 /usr/bin/python3 /usr/local/bin/bgpmon
root        7512  0.0  0.2  35924 20052 pts/0    S    00:28   0:00 /usr/bin/python3 /usr/local/bin/staticroutebfd
root       11284  0.0  0.2  31084 18552 ?        Ss   00:29   0:00 /usr/bin/python3 -u /usr/local/bin/sonic-host-server
root       26816  0.0  0.0   6868   648 pts/0    S+   01:01   0:00 grep python
root       26394  2.5  0.0      0     0 ?        Zs   01:01   0:00 [python3] <defunct>
root       26396  2.5  0.0      0     0 ?        Zs   01:01   0:00 [python3] <defunct>
root       26397  2.6  0.0      0     0 ?        Zs   01:01   0:00 [python3] <defunct>

On 202405:

root@msn3420-r0:/home/admin# ps aux --sort=-rss | grep python
root        9790  0.5  2.4 390776 199244 pts/0   Sl   02:14   0:20 python3 /usr/local/bin/xcvrd --skip_cmis_mgr
root       10094  0.2  2.3 254196 188224 pts/0   S    02:14   0:08 python3 /usr/local/bin/thermalctld
root        6171  0.0  0.8 238136 69364 ?        Ss   02:13   0:03 /usr/bin/python3 /usr/local/bin/healthd
root        9276  0.0  0.7 305904 61424 ?        Sl   02:14   0:00 python3 /usr/bin/docker-wait-any -s swss -d syncd teamd
root       10523  0.0  0.7  80036 60060 ?        Ss   02:14   0:00 /usr/bin/python3 /usr/local/bin/hostcfgd
root        6703  0.0  0.7 182732 58260 ?        S    02:13   0:00 /usr/bin/python3 /usr/local/bin/healthd
root        6701  0.1  0.7 176544 58176 ?        S    02:13   0:04 /usr/bin/python3 /usr/local/bin/healthd
root       12596  5.3  0.7 150188 57724 pts/0    Sl   02:15   3:15 python3 -m sonic_ax_impl
root        6679  0.0  0.6 388116 54748 ?        Sl   02:13   0:00 /usr/bin/python3 /usr/local/bin/healthd
root        9798  0.0  0.6  71988 54112 pts/0    S    02:14   0:00 python3 /usr/local/bin/syseepromd
root        6705  0.0  0.6 171332 53000 ?        S    02:13   0:00 /usr/bin/python3 /usr/local/bin/healthd
root        9802  0.0  0.6  68092 50808 pts/0    S    02:14   0:00 python3 /usr/local/bin/thermalctld
root       10522  0.0  0.6  68788 50356 ?        Ss   02:14   0:01 /usr/bin/python3 /usr/local/bin/featured
root        5183  0.0  0.6  68360 49376 ?        Ss   02:13   0:00 /usr/bin/python3 /usr/local/bin/caclmgrd
root        9795  0.3  0.6  66164 49048 pts/0    S    02:14   0:13 python3 /usr/local/bin/psud
root        9799  0.0  0.6  66644 48968 pts/0    S    02:14   0:00 /usr/bin/python3 /usr/local/bin/stormond
root        9806  0.1  0.6  65956 48732 pts/0    S    02:14   0:06 /usr/bin/python3 /usr/local/bin/pcied
admin      11993  0.0  0.5  70028 47940 ?        S    02:15   0:00 python3 /usr/local/bin/container wait mgmt-framework
root        9230  0.0  0.5  70032 47928 ?        S    02:13   0:00 python3 /usr/local/bin/container wait syncd
admin       7339  0.0  0.5  70032 47824 ?        S    02:13   0:00 python3 /usr/local/bin/container wait bgp
root        5194  0.2  0.5  68028 47816 ?        Ss   02:13   0:10 /usr/bin/python3 /usr/local/bin/procdockerstatsd
admin       6613  0.0  0.5  70032 47784 ?        S    02:13   0:00 python3 /usr/local/bin/container wait teamd
admin      11724  0.0  0.5  70032 47772 ?        S    02:15   0:00 python3 /usr/local/bin/container wait gnmi
admin       9234  0.0  0.5  70032 47744 ?        S    02:13   0:00 python3 /usr/local/bin/container wait pmon
admin      11859  0.0  0.5  70032 47744 ?        S    02:15   0:00 python3 /usr/local/bin/container wait lldp
root        5735  0.0  0.5  70032 47704 ?        S    02:13   0:00 python3 /usr/local/bin/container wait eventd
root       12323  0.0  0.5  70032 47676 ?        S    02:15   0:00 python3 /usr/local/bin/container wait snmp
root        9264  0.0  0.5  70032 47660 ?        S    02:13   0:00 python3 /usr/local/bin/container wait dhcp_relay
admin       9242  0.0  0.5  70032 47360 ?        S    02:13   0:00 python3 /usr/local/bin/container wait radv
root       12475  0.0  0.5  63412 45752 pts/0    S    02:15   0:00 python3 /usr/bin/lldpmgrd
root        7734  0.0  0.4 126004 34592 pts/0    Sl   02:13   0:00 /usr/bin/python3 /usr/local/bin/bgpcfgd
root        6110  0.0  0.4 130596 33456 pts/0    Sl   02:13   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name eventd
root       12383  0.0  0.4 133872 33040 pts/0    Sl   02:15   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name snmp
root       12001  0.0  0.4 130596 32540 pts/0    Sl   02:15   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name lldp
root        9371  0.0  0.4 130600 32444 pts/0    Sl   02:14   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name dhcp_relay
root        6653  0.0  0.4 130596 32396 pts/0    Sl   02:13   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name swss
root       11776  0.0  0.4 130596 32224 pts/0    Sl   02:15   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name gnmi
root        9376  0.0  0.4 130596 32208 pts/0    Sl   02:14   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name radv
root        1462  0.0  0.4 130596 32204 pts/0    Sl   02:13   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name database
root        6517  0.0  0.4  37020 32184 pts/0    Ss+  02:13   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        9556  0.0  0.4 130600 32180 pts/0    Sl   02:14   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name pmon
root        7054  0.0  0.4 130596 32156 pts/0    Sl   02:13   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name teamd
root       11838  0.0  0.4  37024 32120 pts/0    Ss+  02:15   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        7676  0.0  0.3 130596 32032 pts/0    Sl   02:13   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name bgp
root        9177  0.0  0.3  36892 31800 pts/0    Ss+  02:13   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        7244  0.0  0.3  37032 31768 pts/0    Ss+  02:13   0:02 /usr/bin/python3 /usr/local/bin/supervisord
root        8470  0.0  0.3  36888 31756 pts/0    Ss+  02:13   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        9175  0.0  0.3  37040 31732 pts/0    Ss+  02:13   0:02 /usr/bin/python3 /usr/local/bin/supervisord
root       11705  0.0  0.3  36888 31728 pts/0    Ss+  02:15   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        5983  0.1  0.3  37188 31688 pts/0    Ss+  02:13   0:04 /usr/bin/python3 /usr/local/bin/supervisord
root        1333  0.0  0.3  36888 31636 pts/0    Ss+  02:13   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root       12302  0.0  0.3  40164 31532 pts/0    Ss+  02:15   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root       12459  0.0  0.3  36872 31396 pts/0    Ss+  02:15   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        9259  0.0  0.3 130592 31392 pts/0    Sl   02:13   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name syncd
root        5600  0.0  0.3  36888 31364 pts/0    Ss+  02:13   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root        9176  0.0  0.3  36888 31324 pts/0    Ss+  02:13   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root       11975  0.0  0.3  36884 31284 pts/0    Ss+  02:15   0:01 /usr/bin/python3 /usr/local/bin/supervisord
root       12333  0.1  0.3 114032 24660 pts/0    Sl   02:15   0:04 python3 -m lldp_syncd
root        6170  0.8  0.2 251288 23584 ?        Sl   02:13   0:30 /usr/bin/python /usr/bin/hw_management_thermal_control.py
root        9550  0.0  0.2  39456 23448 pts/0    S    02:14   0:00 /usr/bin/python3 /usr/local/bin/dhcprelayd
root        7740  0.0  0.2  37552 21248 pts/0    S    02:13   0:00 /usr/bin/python3 /usr/local/bin/staticroutebfd
root        7735  0.0  0.2  37660 20484 pts/0    S    02:13   0:00 /usr/bin/python3 /usr/local/bin/bgpmon
root       11645  0.0  0.2  33388 20024 ?        Ss   02:15   0:00 /usr/bin/python3 -u /usr/local/bin/sonic-host-server
root       39691  0.0  0.0   6972  2108 pts/0    S+   03:16   0:00 grep python
root       39264  1.7  0.0      0     0 ?        Zs   03:15   0:00 [python3] <defunct>
root       39266  0.8  0.0      0     0 ?        Zs   03:15   0:00 [python3] <defunct>
root       39267  1.1  0.0      0     0 ?        Zs   03:15   0:00 [python3] <defunct>
root@msn3420-r0:/home/admin# show ver

SONiC Software Version: SONiC.202405.3-658f752aa_Internal

Describe the results you expected:

No significant difference ideally, if not test should accomodate the diff

zjswhhh commented 1 month ago

@saiarcot895 - since there is no option to change the speed improvement setting, there is a task to change the threshold, can you take the action item?