Open wumiaont opened 4 months ago
@zbud-msft Please help take a look.
@wumiaont Can you please verify if other paths work that are not EVENTS db? Is it maybe an accessibility issue? Are you seeing heartbeats on the subscriber (ptf gnmi client) side? In syslog are you able to see the bgp state change log?
I have verified the bgp notification events are received when we create rule to drop packets to/from port 179. BGP state change events are not received. Also if I do not use filter I can see heartbeat events are received.
There are other issues such as https://github.com/sonic-net/sonic-buildimage/issues/19603. Please look at comments there. I can see events were published from log for swss. But gnmi client does not get it. If I remove filter_event_regex from cli I can get heartbeats events. But no swss events if we trigger such as port shutdown/startup actions.
python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest --subscribe_mode 0 --submode 1 --interval 0 --update_count 100
2024-08-01 01:14:59.826520 response received: update { timestamp: 1722474899815081373 prefix { target: "EVENTS" } update { path { elem { name: "all" key { key: "heartbeat" value: "2" } } } val { json_ietf_val: "{\"sonic-events-eventd:heartbeat\":{\"timestamp\":\"2024-08-01T01:14:59.815016Z\"}}" } } } ......
@zbud-msft many events tests work such as host events, bgp notification events. Only bgp state event and swss events are not received. I can see from log the corresponding events are published. Please help to check what could be wrong here.
This is for bgp notification test. That works. I am using client without the regex filtering from ptf server. python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest --subscribe_mo-submode 0 --submode 1 --interval 0 --update_count 100
2024-08-01 15:57:32.814535 response received: update { timestamp: 1722527852781597007 prefix { target: "EVENTS" } update { path { elem { name: "all" key { key: "heartbeat" value: "2" } } } val { json_ietf_val: "{\"sonic-events-eventd:heartbeat\":{\"timestamp\":\"2024-08-01T15:57:32.781521Z\"}}" } } }
2024-08-01 15:57:33.750013 response received: update { timestamp: 1722527853743868153 prefix { target: "EVENTS" } update { path { elem { name: "all" key { key: "heartbeat" value: "2" } } } val { json_ietf_val: "{\"sonic-events-bgp:notification\":{\"ip\":\"10.0.0.1\",\"is_sent\":\"true\",\"major_code\":\"4\",\"minor_code\":\"0\",\"timestamp\":\"2024-08-01T15:57:33.743775Z\"}}" } } }
For BGP state test. I can see from syslog the event is published. But gnmi client only received heartbeat events during test.
2024 Aug 1 16:28:15.406947 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"down","timestamp":"2024-08-01T16:28:15.406799Z"}} 2024 Aug 1 16:28:15.407054 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"down","timestamp":"2024-08-01T16:28:15.406876Z"}} 2024 Aug 1 16:28:16.692872 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.1","status":"up","timestamp":"2024-08-01T16:28:16.692602Z"}} 2024 Aug 1 16:28:16.694208 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.11","status":"up","timestamp":"2024-08-01T16:28:16.694020Z"}} 2024 Aug 1 16:28:16.7616
@zbud-msft It looks to me that if the event is published by a global service that will work. It it's published by service under certain namespace then client will not receive it. Below log you can see bgp-state events are published from bgp0 or bgp1. Which failed to be received by client. notification is published by rsyslog_plugin, which works.
swss has similar issue of failure as swss is with each namespace.
Looks the publish code has an issue to handle services under namespace.
2024 Aug 1 16:53:04.209148 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"down","timestamp":"2024-08-01T16:53:04.208989Z"}} 2024 Aug 1 16:53:04.209213 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"down","timestamp":"2024-08-01T16:53:04.209080Z"}} 2024 Aug 1 16:53:04.218601 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.7","status":"down","timestamp":"2024-08-01T16:53:04.218323Z"}} 2024 Aug 1 16:53:04.218802 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::16","status":"down","timestamp":"2024-08-01T16:53:04.218654Z"}} 2024 Aug 1 16:53:04.218907 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::e","status":"down","timestamp":"2024-08-01T16:53:04.218753Z"}} 2024 Aug 1 16:53:05.290234 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.1","status":"up","timestamp":"2024-08-01T16:53:05.289958Z"}} 2024 Aug 1 16:53:05.293026 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.11","status":"up","timestamp":"2024-08-01T16:53:05.292833Z"}} 2024 Aug 1 16:53:05.406129 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.5","status":"up","timestamp":"2024-08-01T16:53:05.405738Z"}} 2024 Aug 1 16:53:05.406342 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"up","timestamp":"2024-08-01T16:53:05.405859Z"}} 2024 Aug 1 16:53:05.406460 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"up","timestamp":"2024-08-01T16:53:05.406214Z"}} 2024 Aug 1 16:53:05.411090 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.7","status":"up","timestamp":"2024-08-01T16:53:05.410553Z"}} 2024 Aug 1 16:53:05.411447 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::16","status":"up","timestamp":"2024-08-01T16:53:05.411204Z"}} 2024 Aug 1 16:53:05.411546 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::e","status":"up","timestamp":"2024-08-01T16:53:05.411296Z"}} 2024 Aug 1 16:59:47.205043 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.1","is_sent":"true","major_code":"5","minor_code":"0","timestamp":"2024-08-01T16:59:47.204266Z"}} 2024 Aug 1 16:59:47.205251 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.2","is_sent":"true","major_code":"5","minor_code":"0","timestamp":"2024-08-01T16:59:47.204888Z"}} 2024 Aug 1 16:59:49.207720 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.2","is_sent":"true","major_code":"6","minor_code":"7","timestamp":"2024-08-01T16:59:49.207207Z"}} 2024 Aug 1 16:59:49.207871 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.1","is_sent":"true","major_code":"6","minor_code":"7","timestamp":"2024-08-01T16:59:49.207769Z"}}
Hi @wumiaont seems like there is a common issue with multi-asic devices for swss events and bgp state event. I will look into this issue. As of right now, eventd/structured events does not claim to provide full support for multi-asic. In the meantime, I will disable test_events for multi-asic devices. Maybe we can keep one thread open since we have one for swss and one for bgp and they point to same issue.
Hi @wumiaont seems like there is a common issue with multi-asic devices for swss events and bgp state event. I will look into this issue. As of right now, eventd/structured events does not claim to provide full support for multi-asic. In the meantime, I will disable test_events for multi-asic devices. Maybe we can keep one thread open since we have one for swss and one for bgp and they point to same issue.
Thanks for looking into the issue. Let me know if you need anything from me.
@zbud-msft , please share plan for this issue, thanks.
Description
During Telemetry test, found one test case to catch BGP state event failed. The testing is issuing "config bgp startup all" "config bgp shutdown all", config bgp startup all" to the duthost. Then subscribe to gnmi server for the bgp state change event.
It's expecting to receive BGP state change event by the subscriber. This does not happen.
Steps to reproduce the issue:
Describe the results you received:
The test failed.
Describe the results you expected:
startup/shutdown bgp repeatedly shoud trigger BGP state change event.
Output of
show version
:202405/master
More detail for the test you can check sonic-mgmt/tests/telemetry/events/bgp-events.py::test_event
One observation here: the BGP notification event works with similar action described above by creating IP rules to drop TC packets to/from port 179.