sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
724 stars 1.38k forks source link

BGP State Change Does not Trigger BGP State Event #19591

Open wumiaont opened 2 months ago

wumiaont commented 2 months ago

Description

During Telemetry test, found one test case to catch BGP state event failed. The testing is issuing "config bgp startup all" "config bgp shutdown all", config bgp startup all" to the duthost. Then subscribe to gnmi server for the bgp state change event.

It's expecting to receive BGP state change event by the subscriber. This does not happen.

Steps to reproduce the issue:

  1. On PTF server, issue "python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest -n --subscribe_mode 0 --submode 1 --interval 0 --update_count 1 --filter_event_regex sonic-events-bgp:bgp-state" Here 10.250.6.231 is the mgmt ip of duthost.
  2. Issue "config bgp startup all" "config bgp shutdown all", config bgp startup all" from the chassis.
  3. On PTF server there's no response. Subscriber still waiting for valid response.

Describe the results you received:

The test failed.

Describe the results you expected:

startup/shutdown bgp repeatedly shoud trigger BGP state change event.

Output of show version:

202405/master

(paste your output here)

More detail for the test you can check sonic-mgmt/tests/telemetry/events/bgp-events.py::test_event

One observation here: the BGP notification event works with similar action described above by creating IP rules to drop TC packets to/from port 179.

vmittal-msft commented 2 months ago

@zbud-msft Please help take a look.

zbud-msft commented 1 month ago

@wumiaont Can you please verify if other paths work that are not EVENTS db? Is it maybe an accessibility issue? Are you seeing heartbeats on the subscriber (ptf gnmi client) side? In syslog are you able to see the bgp state change log?

wumiaont commented 1 month ago

I have verified the bgp notification events are received when we create rule to drop packets to/from port 179. BGP state change events are not received. Also if I do not use filter I can see heartbeat events are received.

wumiaont commented 1 month ago

There are other issues such as https://github.com/sonic-net/sonic-buildimage/issues/19603. Please look at comments there. I can see events were published from log for swss. But gnmi client does not get it. If I remove filter_event_regex from cli I can get heartbeats events. But no swss events if we trigger such as port shutdown/startup actions.

python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest --subscribe_mode 0 --submode 1 --interval 0 --update_count 100

2024-08-01 01:14:59.826520 response received: update { timestamp: 1722474899815081373 prefix { target: "EVENTS" } update { path { elem { name: "all" key { key: "heartbeat" value: "2" } } } val { json_ietf_val: "{\"sonic-events-eventd:heartbeat\":{\"timestamp\":\"2024-08-01T01:14:59.815016Z\"}}" } } } ......

wumiaont commented 1 month ago

@zbud-msft many events tests work such as host events, bgp notification events. Only bgp state event and swss events are not received. I can see from log the corresponding events are published. Please help to check what could be wrong here.

wumiaont commented 1 month ago

This is for bgp notification test. That works. I am using client without the regex filtering from ptf server. python /root/gnxi/gnmi_cli_py/py_gnmicli.py -g -t 10.250.6.231 -p 8080 -m subscribe -x all[heartbeat=2] -xt EVENTS -o ndastreamingservertest --subscribe_mo-submode 0 --submode 1 --interval 0 --update_count 100

2024-08-01 15:57:32.814535 response received: update { timestamp: 1722527852781597007 prefix { target: "EVENTS" } update { path { elem { name: "all" key { key: "heartbeat" value: "2" } } } val { json_ietf_val: "{\"sonic-events-eventd:heartbeat\":{\"timestamp\":\"2024-08-01T15:57:32.781521Z\"}}" } } }

2024-08-01 15:57:33.750013 response received: update { timestamp: 1722527853743868153 prefix { target: "EVENTS" } update { path { elem { name: "all" key { key: "heartbeat" value: "2" } } } val { json_ietf_val: "{\"sonic-events-bgp:notification\":{\"ip\":\"10.0.0.1\",\"is_sent\":\"true\",\"major_code\":\"4\",\"minor_code\":\"0\",\"timestamp\":\"2024-08-01T15:57:33.743775Z\"}}" } } }

wumiaont commented 1 month ago

For BGP state test. I can see from syslog the event is published. But gnmi client only received heartbeat events during test.

2024 Aug 1 16:28:15.406947 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"down","timestamp":"2024-08-01T16:28:15.406799Z"}} 2024 Aug 1 16:28:15.407054 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"down","timestamp":"2024-08-01T16:28:15.406876Z"}} 2024 Aug 1 16:28:16.692872 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.1","status":"up","timestamp":"2024-08-01T16:28:16.692602Z"}} 2024 Aug 1 16:28:16.694208 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.11","status":"up","timestamp":"2024-08-01T16:28:16.694020Z"}} 2024 Aug 1 16:28:16.7616

wumiaont commented 1 month ago

@zbud-msft It looks to me that if the event is published by a global service that will work. It it's published by service under certain namespace then client will not receive it. Below log you can see bgp-state events are published from bgp0 or bgp1. Which failed to be received by client. notification is published by rsyslog_plugin, which works.

swss has similar issue of failure as swss is with each namespace.

Looks the publish code has an issue to handle services under namespace.

2024 Aug 1 16:53:04.209148 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"down","timestamp":"2024-08-01T16:53:04.208989Z"}} 2024 Aug 1 16:53:04.209213 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"down","timestamp":"2024-08-01T16:53:04.209080Z"}} 2024 Aug 1 16:53:04.218601 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.7","status":"down","timestamp":"2024-08-01T16:53:04.218323Z"}} 2024 Aug 1 16:53:04.218802 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::16","status":"down","timestamp":"2024-08-01T16:53:04.218654Z"}} 2024 Aug 1 16:53:04.218907 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::e","status":"down","timestamp":"2024-08-01T16:53:04.218753Z"}} 2024 Aug 1 16:53:05.290234 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.1","status":"up","timestamp":"2024-08-01T16:53:05.289958Z"}} 2024 Aug 1 16:53:05.293026 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.11","status":"up","timestamp":"2024-08-01T16:53:05.292833Z"}} 2024 Aug 1 16:53:05.406129 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.5","status":"up","timestamp":"2024-08-01T16:53:05.405738Z"}} 2024 Aug 1 16:53:05.406342 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::2","status":"up","timestamp":"2024-08-01T16:53:05.405859Z"}} 2024 Aug 1 16:53:05.406460 ixre-egl-board29 NOTICE bgp0#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::a","status":"up","timestamp":"2024-08-01T16:53:05.406214Z"}} 2024 Aug 1 16:53:05.411090 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"10.0.0.7","status":"up","timestamp":"2024-08-01T16:53:05.410553Z"}} 2024 Aug 1 16:53:05.411447 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::16","status":"up","timestamp":"2024-08-01T16:53:05.411204Z"}} 2024 Aug 1 16:53:05.411546 ixre-egl-board29 NOTICE bgp1#rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:bgp-state":{"ip":"fc00::e","status":"up","timestamp":"2024-08-01T16:53:05.411296Z"}} 2024 Aug 1 16:59:47.205043 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.1","is_sent":"true","major_code":"5","minor_code":"0","timestamp":"2024-08-01T16:59:47.204266Z"}} 2024 Aug 1 16:59:47.205251 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.2","is_sent":"true","major_code":"5","minor_code":"0","timestamp":"2024-08-01T16:59:47.204888Z"}} 2024 Aug 1 16:59:49.207720 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.2","is_sent":"true","major_code":"6","minor_code":"7","timestamp":"2024-08-01T16:59:49.207207Z"}} 2024 Aug 1 16:59:49.207871 ixre-egl-board29 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-bgp:notification":{"ip":"3.3.3.1","is_sent":"true","major_code":"6","minor_code":"7","timestamp":"2024-08-01T16:59:49.207769Z"}}

zbud-msft commented 1 month ago

Hi @wumiaont seems like there is a common issue with multi-asic devices for swss events and bgp state event. I will look into this issue. As of right now, eventd/structured events does not claim to provide full support for multi-asic. In the meantime, I will disable test_events for multi-asic devices. Maybe we can keep one thread open since we have one for swss and one for bgp and they point to same issue.

wumiaont commented 1 month ago

Hi @wumiaont seems like there is a common issue with multi-asic devices for swss events and bgp state event. I will look into this issue. As of right now, eventd/structured events does not claim to provide full support for multi-asic. In the meantime, I will disable test_events for multi-asic devices. Maybe we can keep one thread open since we have one for swss and one for bgp and they point to same issue.

Thanks for looking into the issue. Let me know if you need anything from me.