networkservicemesh / api

Apache License 2.0
12 stars 21 forks source link

Monitor Connection states and event types #169

Closed zolug closed 6 months ago

zolug commented 7 months ago

I'd highly appreciate if you could provide some documentation/explanation regarding the Connection States and Connection Event Types declared in https://github.com/networkservicemesh/api/blob/main/pkg/api/networkservice/connection.proto.

I'd be especially interested from the point of Monitor Connection API (to keep track of operation state, "healthiness" of certain connections.)

denis-tingaikin commented 7 months ago

@zolug Could you please add steps for a problematic scenario that you mentioned on the latest WG call? It may help us with writing an article about the NSM monitoring API and also checking whether the behaviour that you observed was correct or invalid.

zolug commented 7 months ago

@zolug Could you please add steps for a problematic scenario that you mentioned on the latest WG call? It may help us with writing an article about the NSM monitoring API and also checking whether the behaviour that you observed was correct or invalid.

Hi @denis-tingaikin , I did some more testing and I haven't seen the issue on v.1.12.0. However, it is visible on v1.11.2 and v1.11.0. The problem was about doing an update on an established NSC connection (KERNEL mech without datapath monitoring) towards an NSE located on the same worker (single member of the network service) around the same time when the NSE POD was deleted (it unregistered itself during shutdown at registry). The connection update failed as "expected" because the NSE was gone (and there was no other candidate). But occasionally the nsm heal in the nsc did not get the DOWN event, instead printed an event with no type and status (initial_transfer and up I suppose) I guess triggered by the failed update.

But I'm also curious about these events that are seemingly of type intitial_transfer with state up upon any connection update on an established connection (e.g. changing the IPContext by adding/removing src IPs). Based on the description, I would have expected UPDATE type events.