sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
734 stars 1.41k forks source link

[GCU] File Descriptor Not Closed Results in "Too Many Open Files" Error #20508

Open okaravasi opened 5 days ago

okaravasi commented 5 days ago

Description

When the GCU apply-patch command results to many changes, it is noticed that the limit of open-files is hitted.

The issue has been identified with extra debug logging to exist in file "sonic-buildimage/src/sonic-utilities/generic_config_updater/change_applier.py", inside function _get_running_config.

def _get_running_config(self):
        _, fname = tempfile.mkstemp(suffix="_changeApplier")

From definition of mkstemp , this function creates a temp file and opens a file descriptor referring to that, where the user calling is responsible to close after usage.

In faulty above code, the file descriptor is saved to unused variable "_" and never being closed later within the function.

Following fix was tested and verified to solve the issue:

  def _get_running_config(self):
        fd, fname = tempfile.mkstemp(suffix="_changeApplier")
        os.close(fd)

Steps to reproduce the issue:

  1. Create a json file that applies multiple changes so that to have multiple calls of _get_running_config(). For example I tried below file which resulted to 730 changes.
    [
    {
    "op": "remove",
    "path": "/asic1/BUFFER_PG"
    },
    {
    "op": "remove",
    "path": "/asic1/BUFFER_QUEUE"
    },
    {
    "op": "remove",
    "path": "/asic1/PORT_QOS_MAP"
    },
    {
    "op": "remove",
    "path": "/asic1/QUEUE"
    }
    ]
  2. sudo config apply-patch

Describe the results you received:

Cli commad failed with output::

Failed to apply patch due to: Failed to apply patch on the following scopes: -asic1: [Errno 24] Too many open files Usage: config apply-patch [OPTIONS] PATCH_FILE_PATH Try "config apply-patch -h" for help.

Error: Failed to apply patch on the following scopes: -asic1: [Errno 24] Too many open files

Describe the results you expected:

Cli commad/code should not hit open file limit.

Patch Applier: asic1: verifying patch updates are reflected on ConfigDB. Patch Applier: asic1 patch application completed. Patch applied successfully.

Output of show version:

admin@ixre-egl-board41:~$ show version

SONiC Software Version: SONiC.HEAD.847920-nokia-master-7443242ec
SONiC OS Version: 12
Distribution: Debian 12.7
Kernel: 6.1.0-22-2-amd64
Build commit: 7443242ec
Build date: Fri Oct 11 19:46:16 UTC 2024
Built by: gitlab-runner@sonic-bld2

Platform: x86_64-nokia_ixr7250e_36x400g-r0
HwSKU: Nokia-IXR7250E-36x400G
ASIC: broadcom
ASIC Count: 2
Serial Number: EAG2-02-143
Model Number: N/A
Hardware Revision: 56
Uptime: 13:54:42 up 1 day,  5:54,  1 user,  load average: 1.29, 1.66, 1.94
Date: Tue 15 Oct 2024 13:54:42

Docker images:
REPOSITORY                    TAG                                  IMAGE ID       SIZE
docker-orchagent              HEAD.847920-nokia-master-7443242ec   64c8b2359659   416MB
docker-orchagent              latest                               64c8b2359659   416MB
docker-fpm-frr                HEAD.847920-nokia-master-7443242ec   3a1712d534d2   435MB
docker-fpm-frr                latest                               3a1712d534d2   435MB
docker-nat                    HEAD.847920-nokia-master-7443242ec   11053a716c54   406MB
docker-nat                    latest                               11053a716c54   406MB
docker-dhcp-relay             latest                               85ec1b2bb19d   384MB
docker-macsec                 latest                               95cd8f85f105   406MB
docker-snmp                   HEAD.847920-nokia-master-7443242ec   631396a2652d   418MB
docker-snmp                   latest                               631396a2652d   418MB
docker-teamd                  HEAD.847920-nokia-master-7443242ec   395e63cb240e   403MB
docker-teamd                  latest                               395e63cb240e   403MB
docker-platform-monitor       HEAD.847920-nokia-master-7443242ec   46d6839ed7e9   459MB
docker-platform-monitor       latest                               46d6839ed7e9   459MB
docker-sflow                  HEAD.847920-nokia-master-7443242ec   fe1aa34216ab   404MB
docker-sflow                  latest                               fe1aa34216ab   404MB
docker-router-advertiser      HEAD.847920-nokia-master-7443242ec   a558cb579222   374MB
docker-router-advertiser      latest                               a558cb579222   374MB
docker-mux                    HEAD.847920-nokia-master-7443242ec   feac3ae1e2a1   386MB
docker-mux                    latest                               feac3ae1e2a1   386MB
docker-lldp                   HEAD.847920-nokia-master-7443242ec   a81dcc9f1e83   383MB
docker-lldp                   latest                               a81dcc9f1e83   383MB
docker-sonic-gnmi             HEAD.847920-nokia-master-7443242ec   efde45f0c67b   459MB
docker-sonic-gnmi             latest                               efde45f0c67b   459MB
docker-eventd                 HEAD.847920-nokia-master-7443242ec   253d220dc7a3   374MB
docker-eventd                 latest                               253d220dc7a3   374MB
docker-sonic-mgmt-framework   HEAD.847920-nokia-master-7443242ec   72625996ddcb   424MB
docker-sonic-mgmt-framework   latest                               72625996ddcb   424MB
docker-database               HEAD.847920-nokia-master-7443242ec   db163ca7d2e4   383MB
docker-database               latest                               db163ca7d2e4   383MB
docker-syncd-brcm-dnx         HEAD.847920-nokia-master-7443242ec   7d42f34119a8   759MB
docker-syncd-brcm-dnx         latest                               7d42f34119a8   759MB
docker-gbsyncd-broncos        HEAD.847920-nokia-master-7443242ec   c21d0762cb29   410MB
docker-gbsyncd-broncos        latest                               c21d0762cb29   410MB
docker-gbsyncd-credo          HEAD.847920-nokia-master-7443242ec   30b4f7bcb5db   383MB
docker-gbsyncd-credo          latest                               30b4f7bcb5db   383MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

admin@ixre-egl-board40:~$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 63363
max locked memory           (kbytes, -l) 2032628
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 63363
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited
admin@ixre-egl-board40:~$
okaravasi commented 5 days ago

@judyjoseph @xincunli-sonic Could you please review this issue and assist with assigning it to the appropriate person? Thank you.