sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
705 stars 1.35k forks source link

[Dhcprelay]dhcprelayd crashes with a traceback #19507

Closed dgsudharsan closed 2 days ago

dgsudharsan commented 1 week ago

Description

The below issue happened twice in 202311 without any specific trigger. Dhcprelayd crashes and results in below error logs

Jul  3 05:29:17.760127 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:29:17,758 INFO waiting for supervisor-proc-exit-listener, rsyslogd, isc-dhcpv4-relay-Vlan1000, dhcprelayd, dhcp6relay, dhcpmon-Vlan1000 to die
Jul  3 05:29:17.796958 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:29:17,796 INFO stopped: dhcprelayd (terminated by SIGTERM)
Jul  3 05:30:17.444896 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:30:17,443 INFO spawned: 'dhcprelayd' with pid 59
Jul  3 05:30:18.577729 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:30:18,576 INFO success: dhcprelayd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jul  3 05:47:36.025691 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:47:36,016 INFO waiting for supervisor-proc-exit-listener, rsyslogd, isc-dhcpv4-relay-Vlan1000, dhcprelayd, dhcp6relay, dhcpmon-Vlan1000 to die
Jul  3 05:47:36.061261 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:47:36,060 INFO stopped: dhcprelayd (terminated by SIGTERM)
Jul  3 05:48:35.614285 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:48:35,613 INFO spawned: 'dhcprelayd' with pid 52
Jul  3 05:48:36.950710 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 05:48:36,950 INFO success: dhcprelayd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jul  3 06:21:02.966335 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd Traceback (most recent call last):
Jul  3 06:21:02.966552 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1717, in wrapper
Jul  3 06:21:02.967743 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     return fun(self, *args, **kwargs)
Jul  3 06:21:02.967870 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 508, in wrapper
Jul  3 06:21:02.968370 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     raise raise_from(err, None)
Jul  3 06:21:02.968450 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "<string>", line 3, in raise_from
Jul  3 06:21:02.968506 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 506, in wrapper
Jul  3 06:21:02.968831 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     return fun(self)
Jul  3 06:21:02.968852 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1780, in _parse_stat_file
Jul  3 06:21:02.969127 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
Jul  3 06:21:02.969149 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 851, in bcat
Jul  3 06:21:02.969418 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     return cat(fname, fallback=fallback, _open=open_binary)
Jul  3 06:21:02.969438 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 839, in cat
Jul  3 06:21:02.969715 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     with _open(fname) as f:
Jul  3 06:21:02.969715 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 799, in open_binary
Jul  3 06:21:02.970059 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
Jul  3 06:21:02.970223 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd FileNotFoundError: [Errno 2] No such file or directory: '/proc/262/stat'
Jul  3 06:21:02.970451 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd
Jul  3 06:21:02.970451 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd During handling of the above exception, another exception occurred:
Jul  3 06:21:02.970451 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd
Jul  3 06:21:02.970451 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd Traceback (most recent call last):
Jul  3 06:21:02.970451 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/bin/dhcprelayd", line 8, in <module>
Jul  3 06:21:02.970451 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     sys.exit(main())
Jul  3 06:21:02.970702 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/dhcprelayd/dhcprelayd.py", line 308, in main
Jul  3 06:21:02.971789 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     dhcprelayd.wait()
Jul  3 06:21:02.971946 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/dhcprelayd/dhcprelayd.py", line 133, in wait
Jul  3 06:21:02.972056 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     self._check_dhcp_relay_processes()
Jul  3 06:21:02.972200 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/dhcprelayd/dhcprelayd.py", line 166, in _check_dhcp_relay_processes
Jul  3 06:21:02.972278 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     running_cmds = get_target_process_cmds("dhcrelay")
Jul  3 06:21:02.972449 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/common/utils.py", line 163, in get_target_process_cmds
Jul  3 06:21:02.972886 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     if proc.name() == process_name:
Jul  3 06:21:02.972886 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 656, in name
Jul  3 06:21:02.972886 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     name = self._proc.name()
Jul  3 06:21:02.972886 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1717, in wrapper
Jul  3 06:21:02.973328 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     return fun(self, *args, **kwargs)
Jul  3 06:21:02.973328 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1831, in name
Jul  3 06:21:02.974009 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     name = self._parse_stat_file()['name']
Jul  3 06:21:02.974009 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1726, in wrapper
Jul  3 06:21:02.974621 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     raise NoSuchProcess(self.pid, self._name)
Jul  3 06:21:02.974878 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd psutil.NoSuchProcess: process no longer exists (pid=262)
Jul  3 06:21:02.989391 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 06:21:02,988 INFO exited: dhcprelayd (exit status 1; not expected)
Jul  3 06:22:04.045006 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (1.0 minutes).
Jul  3 06:23:04.101455 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (2.0 minutes).
Jul  3 06:24:04.158214 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (3.0 minutes).
Jul  3 06:25:04.216673 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (4.0 minutes).
Jul  3 06:26:04.268193 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (5.0 minutes).
Jul  3 06:27:04.317562 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (6.0 minutes).
Jul  3 06:28:04.370594 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (7.0 minutes).
Jul  3 06:29:04.418379 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (8.0 minutes).
Jul  3 06:30:04.468323 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (9.0 minutes).
Jul  3 06:31:04.524763 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (10.0 minutes).
Jul  3 06:32:04.576270 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (11.0 minutes).
Jul  3 06:33:04.625357 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (12.0 minutes).
Jul  3 06:34:04.680079 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (13.0 minutes).
Jul  3 06:35:04.731581 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (14.0 minutes).
Jul  3 06:36:04.780928 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (15.0 minutes).
Jul  3 06:37:04.837283 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (16.0 minutes).
Jul  3 06:38:04.891576 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (17.0 minutes).
Jul  3 06:39:04.944373 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (18.0 minutes).
Jul  3 06:40:05.002239 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (19.0 minutes).
Jul  3 06:41:05.050822 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (20.0 minutes).
Jul  3 06:42:05.103436 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (21.0 minutes).
Jul  3 06:43:05.154865 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (22.0 minutes).
Jul  3 06:44:05.209674 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (23.0 minutes).
Jul  3 06:45:05.262286 r-panther-40 ERR dhcp_relay#supervisor-proc-exit-listener: Process 'dhcprelayd' is not running in namespace 'host' (24.0 minutes).

There is no logs from dhcrelay during this time and hence no information on why the process is unavailable.

syslog.1.gz:Jul  3 05:48:35.699273 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Loaded 57 interface name-alias mappings
syslog.1.gz:Jul  3 05:48:35.699588 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Internet Systems Consortium DHCP Relay Agent 4.4.1
syslog.1.gz:Jul  3 05:48:35.699821 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Copyright 2004-2018 Internet Systems Consortium.
syslog.1.gz:Jul  3 05:48:35.700136 r-panther-40 INFO dhcp_relay#dhcrelay[53]: All rights reserved.
syslog.1.gz:Jul  3 05:48:35.700325 r-panther-40 INFO dhcp_relay#dhcrelay[53]: For info, please visit https://www.isc.org/software/dhcp/
syslog.1.gz:Jul  3 05:48:35.713992 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet126
syslog.1.gz:Jul  3 05:48:35.714319 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet126
syslog.1.gz:Jul  3 05:48:35.714792 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet124
syslog.1.gz:Jul  3 05:48:35.715050 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet124
syslog.1.gz:Jul  3 05:48:35.715357 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet122
syslog.1.gz:Jul  3 05:48:35.715581 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet122
syslog.1.gz:Jul  3 05:48:35.715871 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet120
syslog.1.gz:Jul  3 05:48:35.716160 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet120
syslog.1.gz:Jul  3 05:48:35.717449 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet118
syslog.1.gz:Jul  3 05:48:35.717697 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet118
syslog.1.gz:Jul  3 05:48:35.718017 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet116
syslog.1.gz:Jul  3 05:48:35.718225 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet116
syslog.1.gz:Jul  3 05:48:35.718514 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet114
syslog.1.gz:Jul  3 05:48:35.718718 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet114
syslog.1.gz:Jul  3 05:48:35.719096 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet112
syslog.1.gz:Jul  3 05:48:35.719294 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet112
syslog.1.gz:Jul  3 05:48:35.719588 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet110
syslog.1.gz:Jul  3 05:48:35.719844 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet110
syslog.1.gz:Jul  3 05:48:35.720119 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet108
syslog.1.gz:Jul  3 05:48:35.720321 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet108
syslog.1.gz:Jul  3 05:48:35.720757 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet106
syslog.1.gz:Jul  3 05:48:35.721666 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet106
syslog.1.gz:Jul  3 05:48:35.722002 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet104
syslog.1.gz:Jul  3 05:48:35.722354 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet104
syslog.1.gz:Jul  3 05:48:35.722655 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet100
syslog.1.gz:Jul  3 05:48:35.722857 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet100
syslog.1.gz:Jul  3 05:48:35.723124 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet96
syslog.1.gz:Jul  3 05:48:35.723318 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet96
syslog.1.gz:Jul  3 05:48:35.723887 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet92
syslog.1.gz:Jul  3 05:48:35.724087 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet92
syslog.1.gz:Jul  3 05:48:35.724367 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet88
syslog.1.gz:Jul  3 05:48:35.724684 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet88
syslog.1.gz:Jul  3 05:48:35.725034 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet86
syslog.1.gz:Jul  3 05:48:35.725660 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet86
syslog.1.gz:Jul  3 05:48:35.725981 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet84
syslog.1.gz:Jul  3 05:48:35.726178 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet84
syslog.1.gz:Jul  3 05:48:35.726425 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet82
syslog.1.gz:Jul  3 05:48:35.727185 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet82
syslog.1.gz:Jul  3 05:48:35.727454 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet80
syslog.1.gz:Jul  3 05:48:35.727660 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet80
syslog.1.gz:Jul  3 05:48:35.727915 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet78
syslog.1.gz:Jul  3 05:48:35.728105 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet78
syslog.1.gz:Jul  3 05:48:35.728387 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet76
syslog.1.gz:Jul  3 05:48:35.728705 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet76
syslog.1.gz:Jul  3 05:48:35.728949 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet74
syslog.1.gz:Jul  3 05:48:35.732925 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet74
syslog.1.gz:Jul  3 05:48:35.735155 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet72
syslog.1.gz:Jul  3 05:48:35.735896 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet72
syslog.1.gz:Jul  3 05:48:35.736202 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet70
syslog.1.gz:Jul  3 05:48:35.736408 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet70
syslog.1.gz:Jul  3 05:48:35.736704 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet68
syslog.1.gz:Jul  3 05:48:35.737072 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet68
syslog.1.gz:Jul  3 05:48:35.737365 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet66
syslog.1.gz:Jul  3 05:48:35.738030 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet66
syslog.1.gz:Jul  3 05:48:35.739413 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet64
syslog.1.gz:Jul  3 05:48:35.739686 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet64
syslog.1.gz:Jul  3 05:48:35.740177 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet62
syslog.1.gz:Jul  3 05:48:35.740391 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet62
syslog.1.gz:Jul  3 05:48:35.740674 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet60
syslog.1.gz:Jul  3 05:48:35.741351 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet60
syslog.1.gz:Jul  3 05:48:35.741633 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet58
syslog.1.gz:Jul  3 05:48:35.741864 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet58
syslog.1.gz:Jul  3 05:48:35.742157 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet56
syslog.1.gz:Jul  3 05:48:35.742404 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet56
syslog.1.gz:Jul  3 05:48:35.743500 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet54
syslog.1.gz:Jul  3 05:48:35.744124 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet54
syslog.1.gz:Jul  3 05:48:35.744431 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet52
syslog.1.gz:Jul  3 05:48:35.744651 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet52
syslog.1.gz:Jul  3 05:48:35.744903 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet50
syslog.1.gz:Jul  3 05:48:35.745086 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet50
syslog.1.gz:Jul  3 05:48:35.745333 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet48
syslog.1.gz:Jul  3 05:48:35.745674 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet48
syslog.1.gz:Jul  3 05:48:35.745967 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet46
syslog.1.gz:Jul  3 05:48:35.746777 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet46
syslog.1.gz:Jul  3 05:48:35.747040 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet44
syslog.1.gz:Jul  3 05:48:35.747358 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet44
syslog.1.gz:Jul  3 05:48:35.747622 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet42
syslog.1.gz:Jul  3 05:48:35.747832 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet42
syslog.1.gz:Jul  3 05:48:35.748077 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet40
syslog.1.gz:Jul  3 05:48:35.748273 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet40
syslog.1.gz:Jul  3 05:48:35.748673 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet36
syslog.1.gz:Jul  3 05:48:35.748894 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet36
syslog.1.gz:Jul  3 05:48:35.749154 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet32
syslog.1.gz:Jul  3 05:48:35.750130 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet32
syslog.1.gz:Jul  3 05:48:35.750427 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet28
syslog.1.gz:Jul  3 05:48:35.750618 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet28
syslog.1.gz:Jul  3 05:48:35.750995 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet24
syslog.1.gz:Jul  3 05:48:35.751191 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet24
syslog.1.gz:Jul  3 05:48:35.751440 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet22
syslog.1.gz:Jul  3 05:48:35.751625 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet22
syslog.1.gz:Jul  3 05:48:35.752254 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet20
syslog.1.gz:Jul  3 05:48:35.752507 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet20
syslog.1.gz:Jul  3 05:48:35.752780 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet18
syslog.1.gz:Jul  3 05:48:35.753336 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet18
syslog.1.gz:Jul  3 05:48:35.753753 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet16
syslog.1.gz:Jul  3 05:48:35.754190 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet16
syslog.1.gz:Jul  3 05:48:35.754642 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet14
syslog.1.gz:Jul  3 05:48:35.755190 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet14
syslog.1.gz:Jul  3 05:48:35.760301 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet12
syslog.1.gz:Jul  3 05:48:35.761515 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet12
syslog.1.gz:Jul  3 05:48:35.761924 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet10
syslog.1.gz:Jul  3 05:48:35.762145 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet10
syslog.1.gz:Jul  3 05:48:35.762423 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet8
syslog.1.gz:Jul  3 05:48:35.762990 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet8
syslog.1.gz:Jul  3 05:48:35.763264 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet6
syslog.1.gz:Jul  3 05:48:35.763582 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet6
syslog.1.gz:Jul  3 05:48:35.763850 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet4
syslog.1.gz:Jul  3 05:48:35.764228 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet4
syslog.1.gz:Jul  3 05:48:35.764488 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet2
syslog.1.gz:Jul  3 05:48:35.765239 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet2
syslog.1.gz:Jul  3 05:48:35.765520 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Ethernet0
syslog.1.gz:Jul  3 05:48:35.765717 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Ethernet0
syslog.1.gz:Jul  3 05:48:35.766021 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Loopback0
syslog.1.gz:Jul  3 05:48:35.766240 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Loopback0
syslog.1.gz:Jul  3 05:48:35.795459 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/dummy
syslog.1.gz:Jul  3 05:48:35.796484 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/dummy
syslog.1.gz:Jul  3 05:48:35.798946 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Bridge
syslog.1.gz:Jul  3 05:48:35.799223 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Bridge
syslog.1.gz:Jul  3 05:48:35.799320 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/swid0_eth
syslog.1.gz:Jul  3 05:48:35.799412 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/swid0_eth
syslog.1.gz:Jul  3 05:48:35.799502 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/docker0
syslog.1.gz:Jul  3 05:48:35.799598 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/docker0
syslog.1.gz:Jul  3 05:48:35.799706 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/eth1
syslog.1.gz:Jul  3 05:48:35.803467 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/eth1
syslog.1.gz:Jul  3 05:48:35.804801 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/eth0
syslog.1.gz:Jul  3 05:48:35.804946 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/eth0
syslog.1.gz:Jul  3 05:48:35.805584 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/PortChannel104
syslog.1.gz:Jul  3 05:48:35.805698 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/PortChannel104
syslog.1.gz:Jul  3 05:48:35.805796 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/PortChannel103
syslog.1.gz:Jul  3 05:48:35.805900 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/PortChannel103
syslog.1.gz:Jul  3 05:48:35.805999 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/PortChannel102
syslog.1.gz:Jul  3 05:48:35.806303 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/PortChannel102
syslog.1.gz:Jul  3 05:48:35.806405 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/PortChannel101
syslog.1.gz:Jul  3 05:48:35.807120 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/PortChannel101
syslog.1.gz:Jul  3 05:48:35.807260 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Listening on Socket/Vlan1000
syslog.1.gz:Jul  3 05:48:35.807385 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/Vlan1000
syslog.1.gz:Jul  3 05:48:35.807489 r-panther-40 INFO dhcp_relay#dhcrelay[53]: Sending on   Socket/fallback
syslog.1.gz:Jul  3 06:21:02.972278 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd     running_cmds = get_target_process_cmds("dhcrelay"

Steps to reproduce the issue:

1. 2. 3.

Describe the results you received:

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

dgsudharsan commented 1 week ago

The log from other instance

Jun 21 01:10:56.212929 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd Traceback (most recent call last):
Jun 21 01:10:56.212929 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1717, in wrapper
Jun 21 01:10:56.214568 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     return fun(self, *args, **kwargs)
Jun 21 01:10:56.214568 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 508, in wrapper
Jun 21 01:10:56.214958 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     raise raise_from(err, None)
Jun 21 01:10:56.214980 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "<string>", line 3, in raise_from
Jun 21 01:10:56.215272 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 506, in wrapper
Jun 21 01:10:56.215572 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     return fun(self)
Jun 21 01:10:56.215572 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1780, in _parse_stat_file
Jun 21 01:10:56.216045 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
Jun 21 01:10:56.216045 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 851, in bcat
Jun 21 01:10:56.216358 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     return cat(fname, fallback=fallback, _open=open_binary)
Jun 21 01:10:56.216385 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 839, in cat
Jun 21 01:10:56.216837 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     with _open(fname) as f:
Jun 21 01:10:56.216837 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 799, in open_binary
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd FileNotFoundError: [Errno 2] No such file or directory: '/proc/537/stat'
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd During handling of the above exception, another exception occurred:
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd Traceback (most recent call last):
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/bin/dhcprelayd", line 8, in <module>
Jun 21 01:10:56.217223 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     sys.exit(main())
Jun 21 01:10:56.217305 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/dhcprelayd/dhcprelayd.py", line 308, in main
Jun 21 01:10:56.217646 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     dhcprelayd.wait()
Jun 21 01:10:56.217646 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/dhcprelayd/dhcprelayd.py", line 133, in wait
Jun 21 01:10:56.217951 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     self._check_dhcp_relay_processes()
Jun 21 01:10:56.217951 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/dhcprelayd/dhcprelayd.py", line 166, in _check_dhcp_relay_processes
Jun 21 01:10:56.217951 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     running_cmds = get_target_process_cmds("dhcrelay")
Jun 21 01:10:56.217951 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/dhcp_utilities/common/utils.py", line 163, in get_target_process_cmds
Jun 21 01:10:56.218300 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     if proc.name() == process_name:
Jun 21 01:10:56.218300 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 656, in name
Jun 21 01:10:56.218750 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     name = self._proc.name()
Jun 21 01:10:56.218750 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1717, in wrapper
Jun 21 01:10:56.219165 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     return fun(self, *args, **kwargs)
Jun 21 01:10:56.219165 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1831, in name
Jun 21 01:10:56.219575 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     name = self._parse_stat_file()['name']
Jun 21 01:10:56.219592 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd   File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1726, in wrapper
Jun 21 01:10:56.220157 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd     raise NoSuchProcess(self.pid, self._name)
Jun 21 01:10:56.220368 r-leopard-70 INFO dhcp_relay#supervisord: dhcprelayd psutil.NoSuchProcess: process no longer exists (pid=537)
Jun 21 01:10:56.231538 r-leopard-70 INFO dhcp_relay#supervisord 2024-06-21 01:10:56,231 INFO exited: dhcprelayd (exit status 1; not expected)
dgsudharsan commented 1 week ago

@yxieca @yaqiangz Can you please help prioritize this bug?

yaqiangz commented 1 week ago

Hi @dgsudharsan , could you help to provide the reproduce step?

dgsudharsan commented 1 week ago

@yaqiangz As mentioned there is no reproduce step. Traceback is seen without any external trigger

yaqiangz commented 1 week ago

@dgsudharsan please provide more details (hwsku, topology, image commit id etc.) and try reinstall image

I tried latest build for 202311 (commit: 74b81ffcd76290dd2eeff4b66e742a705f76c165) in t0/m0, didn't hit this issue, dhcprelayd works well

admin@sonic:~$ date
Tue 09 Jul 2024 08:35:05 AM UTC
admin@sonic:~$ docker ps -a
CONTAINER ID   IMAGE                                COMMAND                  CREATED          STATUS          PORTS     NAMES
b3e1cf10590f   docker-snmp:latest                   "/usr/local/bin/supe…"   22 minutes ago   Up 22 minutes             snmp
f217dc52e560   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   22 minutes ago   Up 22 minutes             mgmt-framework
bf36a280787a   docker-lldp:latest                   "/usr/bin/docker-lld…"   22 minutes ago   Up 22 minutes             lldp
cf05127f3511   docker-sonic-gnmi:latest             "/usr/local/bin/supe…"   22 minutes ago   Up 22 minutes             gnmi
cea36c5938bc   02e737784b02                         "/usr/bin/docker_ini…"   22 minutes ago   Up 22 minutes             dhcp_relay
ac87c236084a   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   23 minutes ago   Up 23 minutes             pmon
e4e3426988c0   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   23 minutes ago   Up 23 minutes             radv
8c7ae91d5787   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   23 minutes ago   Up 23 minutes             syncd
35b4592988f5   docker-teamd:latest                  "/usr/local/bin/supe…"   23 minutes ago   Up 23 minutes             teamd
59cfd960818f   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   23 minutes ago   Up 23 minutes             bgp
3d1526b8df0c   docker-orchagent:latest              "/usr/bin/docker-ini…"   23 minutes ago   Up 23 minutes             swss
835d927e4793   docker-eventd:latest                 "/usr/local/bin/supe…"   23 minutes ago   Up 23 minutes             eventd
179af8c019a0   docker-database:latest               "/usr/local/bin/dock…"   24 minutes ago   Up 24 minutes             database
admin@sonic:~$ ps aux | grep dhc
root       12389  0.0  0.0  11216  5960 ?        Ss   08:12   0:00 /bin/bash /usr/local/bin/dhcp_relay.sh wait
root       12474  0.0  0.0  11348  5880 ?        S    08:12   0:00 /bin/bash /usr/bin/dhcp_relay.sh wait
root       12475  0.0  0.5  68520 42920 ?        S    08:12   0:00 python3 /usr/local/bin/container wait dhcp_relay
root       12948  0.0  0.3 124160 26724 pts/0    Sl   08:12   0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name dhcp_relay
root       14013  0.0  0.1  17892 10004 pts/0    S    08:12   0:00 /usr/sbin/dhcp6relay
root       14014  0.0  0.2  39376 23624 pts/0    S    08:12   0:00 /usr/bin/python3 /usr/local/bin/dhcprelayd
root       14015  0.0  0.0  34508  6664 pts/0    Sl   08:12   0:00 /usr/sbin/dhcrelay -d -m discard -a %h:%p %P --name-alias-map-file /tmp/port-name-alias-map.txt -id Vlan1000 -iu PortChannel101 -iu PortChannel102 -iu PortChannel103 -iu PortChannel104 192.0.0.1 192.0.0.2 192.0.0.3 192.0.0.4 192.0.0.5 192.0.0.6 192.0.0.7 192.0.0.8 192.0.0.9 192.0.0.10 192.0.0.11 192.0.0.12 192.0.0.13 192.0.0.14 192.0.0.15 192.0.0.16 192.0.0.17 192.0.0.18 192.0.0.19 192.0.0.20 192.0.0.21 192.0.0.22 192.0.0.23 192.0.0.24 192.0.0.25 192.0.0.26 192.0.0.27 192.0.0.28 192.0.0.29 192.0.0.30 192.0.0.31 192.0.0.32 192.0.0.33 192.0.0.34 192.0.0.35 192.0.0.36 192.0.0.37 192.0.0.38 192.0.0.39 192.0.0.40 192.0.0.41 192.0.0.42 192.0.0.43 192.0.0.44 192.0.0.45 192.0.0.46 192.0.0.47 192.0.0.48
root       14017  0.0  0.1  99868  9768 pts/0    Sl   08:12   0:00 /usr/bin/rsyslog_plugin -r /etc/rsyslog.d/dhcp_relay_regex.json -m sonic-events-dhcp-relay
root       14035  0.0  0.1 101380 10704 pts/0    Sl   08:12   0:00 /usr/sbin/dhcpmon -id Vlan1000 -iu PortChannel101 -iu PortChannel102 -iu PortChannel103 -iu PortChannel104 -im eth0
admin      25090  0.0  0.0   6868   648 pts/0    S+   08:35   0:00 grep dhc
admin@sonic:~$ sudo cat /var/log/syslog | grep -a dhcprelayd
Jul  9 08:12:54.466893 sonic INFO dhcp_relay#supervisord 2024-07-09 08:12:54,466 INFO spawned: 'dhcprelayd' with pid 43
Jul  9 08:12:55.541800 sonic INFO dhcp_relay#supervisord 2024-07-09 08:12:55,540 INFO success: dhcprelayd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
admin@sonic:~$ 
dgsudharsan commented 1 week ago

Based on commit: https://github.com/sonic-net/sonic-buildimage/tree/156b067c875967618232c02cb51b163e5e287e45 HW SKU: Mellanox-SN2700-A1-D48C8 Topology: T0

@yaqiangz Please note that this is a very rare statistical issue. Even for us it is not seen often. As mentioned in description, it happened only twice until now.

vivekrnv commented 1 week ago

@yaqiangz, Seems to me the issue is in here https://github.com/sonic-net/sonic-buildimage/blob/202311/src/sonic-dhcp-utilities/dhcp_utilities/common/utils.py#L162

name() is not a safe call and might throw a psutil.NoSuchProcess exception. For a daemon dhcprelayd, the exception should be handled and process should not exit.

Now, we don't know what the pid 262 is for and it's hard to figure out cause the process has ended anyway and the issue is very hard to repro.

It could be any temporary process which might be spawned by a dhcpv4 related process or maybe even by rsyslogd and rsyslog plugin.

Jul  3 06:21:02.974878 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd psutil.NoSuchProcess: process no longer exists (pid=262)
Jul  3 06:21:02.989391 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 06:21:02,988 INFO exited: dhcprelayd (exit status 1; not expected)
yaqiangz commented 1 week ago

Based on commit: https://github.com/sonic-net/sonic-buildimage/tree/156b067c875967618232c02cb51b163e5e287e45 HW SKU: Mellanox-SN2700-A1-D48C8 Topology: T0

@yaqiangz Please note that this is a very rare statistical issue. Even for us it is not seen often. As mentioned in description, it happened only twice until now.

@yaqiangz, Seems to me the issue is in here https://github.com/sonic-net/sonic-buildimage/blob/202311/src/sonic-dhcp-utilities/dhcp_utilities/common/utils.py#L162

name() is not a safe call and might throw a psutil.NoSuchProcess exception. For a daemon dhcprelayd, the exception should be handled and process should not exit.

Now, we don't know what the pid 262 is for and it's hard to figure out cause the process has ended anyway and the issue is very hard to repro.

It could be any temporary process which might be spawned by a dhcpv4 related process or maybe even by rsyslogd and rsyslog plugin.

Jul  3 06:21:02.974878 r-panther-40 INFO dhcp_relay#supervisord: dhcprelayd psutil.NoSuchProcess: process no longer exists (pid=262)
Jul  3 06:21:02.989391 r-panther-40 INFO dhcp_relay#supervisord 2024-07-03 06:21:02,988 INFO exited: dhcprelayd (exit status 1; not expected)

Got it, will check for fixing

vivekrnv commented 2 days ago

Fixed by https://github.com/sonic-net/sonic-buildimage/pull/19537