Open dellwuchuan opened 2 years ago
@venkatmahalingam can you please help to find someone in Dell to take a look? Thanks.
When the port_index_mapper receives a signal 15 during config-reload, it immediately exits with a zero exit code https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-sflow/port_index_mapper.py#L102. At this point the select loop is still running. I'm not sure how this signal is propagated to the SWIG lib/C++ sources and how it is handled there.
But other python daemons which use swsscommon.select usually exit with a non-zero exit code when they recieve an interrupt (eg: https://github.com/sonic-net/sonic-host-services/blob/master/scripts/hostcfgd#L78) or break the select loop using a global flag https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-bgpcfgd/bgpcfgd/runner.py#L53. I'm not sure which is the right way to handle an interrupt. Is there a preferred way to do so?
@venkatmahalingam any update on ETA when such a fix can be avaialble?
@padmanarayana Please comment on this issue.
@jeff-yin FYI.
@Gokulnath-Raja Please update the latest status on this bug.
Description
When I enable SFLOW configuration as background configuration which is existing with other configurations, such as VLAN and port. sudo config reload -y, could trigger sflow error log - ERR sflow#port_index_mapper: returned a result with an error set
One of my colleagues had some investigation on this issue, I hope it could provide some advantage.
The log was seen during conflg reload and the port_index_mapper process recieved an interrupt.
Aug 10 05:08:47.850139 r-ocelot-07 NOTICE sflow#port_index_mapper: got signal 15 Aug 10 05:08:47.851442 r-ocelot-07 ERR sflow#port_index_mapper: returned a result with an error set
Aug 10 05:08:47.851766 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper File "/usr/bin/port_index_mapper.py", line 116, in
Aug 10 05:08:47.851766 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper main()
Aug 10 05:08:47.851802 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper File "/usr/bin/port_index_mapper.py", line 108, in main
Aug 10 05:08:47.851802 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper port_mapper.listen()
Aug 10 05:08:47.851802 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper File "/usr/bin/port_index_mapper.py", line 71, in listen
Aug 10 05:08:47.851822 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper (state, c) = self.sel.select(SELECT_TIMEOUT_MS)
Aug 10 05:08:47.851822 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1879, in select
Aug 10 05:08:47.851852 r-ocelot-07 INFO sflow#/supervisord: port_index_mapper return _swsscommon.Select_select(self, timeout)
But this log indicate something is wrong with SWIG or the swsscommon lib in terms of exception handling.
This is harmless, but it's better that this is documented in the community.
Ref regarding the error log seen: https://stackoverflow.com/questions/53796264/systemerror-class-int-returned-a-result-with-an-error-set-in-python
In general:
"[R]eturned a result with an error set" is something that can only be done at the C level. i.e. the C function sets an exception, but then return some value other than NULL.
Steps to reproduce the issue:
Describe the results you received:
In the system log of sonic switch, one sflow error log could be catched during config reload -y: ERR sflow#port_index_mapper: returned a result with an error set
Describe the results you expected:
There should be no sflow error log
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):