sflow / vpp-sflow

sFlow plugin for VPP
Apache License 2.0
6 stars 0 forks source link

Learning Linux ifIndex from linux-cp plugin #7

Open sflow opened 4 weeks ago

sflow commented 4 weeks ago

It seems like the correct way to get the Linux ifIndex number for a PHY interface from the linux-cp plugin is to invoke the binary C VAPI and request it that way. In commit https://github.com/sflow/vpp-sflow/commit/eb6591a7454c4dd779b81b5f9318b6f15064b008 I left a TODO placeholder for where that VAPI client request might be initiated. The same commit adds the Netlink attribute to send it to hsflowd as part of the counter sample. The latest commit to hsflowd is waiting for it.

If hsflowd has anything to add to the picture (such as Optical counters polled from the hardware) then that should already work.

We might end up forcing an extra counter-sample out as soon as an interface is sflow-enabled, so hsflowd can have the full picture as early as possible.

So just need to figure out the VAPI invocation.

sflow commented 3 weeks ago

After some detours caused by what looks like out-of-date documentation I discovered vapi_connect_from_vpp() and it's use in plugins/unittest/test_api.c, but there still seems to be a lot to figure out. A vapi call from one plugin to another (that may or may not be running) is probably not very common. Sigh.

sflow commented 3 weeks ago

It seems to work as long as I call vapi_connect_from_vpp() from a separate thread. The good news is that the new thread only has to stick around long enough to run that one call. It can terminate as soon as the call returns, and it looks like all the subsequent requests and responses can run in the main thread.

sflow commented 3 weeks ago

It turned out to be too easy to stall the main thread if it was involved directly in the VAPI request, so we reverted to having the whole connect-request-response sequence running in the temporary forked thread. This seems much safer. If something goes wrong and that thread deadlocks then it will have no effect on anything else.

This is still experimental (it checks the linux if index numbers every second when it doesn't really need to), but feedback would be appreciated:

Is it OK to run an extra thread provided it only runs for a millisecond or so? It does seem to be how they expect vapi_connect_from_vpp() to be used.

Is it OK to do this tap-dance before including vapi/lcp.api.vapi.h? https://github.com/sflow/vpp-sflow/blob/main/sflow/sflow.c#L36-L45

It seems clear that we could eliminate a lot of complexity if the Linux if_index numbers were accessible in the shared-memory stats tree that we already comb through for counters, or if there is another way to look them up efficiently. Having one plugin make VAPI calls to another seems to work OK now, but will it always? I don't want this to be the only plugin that does it.

sflow commented 3 weeks ago

The sync between the main thread and the VAPI thread is now done more formally, using atomics. Although using pthread_tryjoin_np() might be cleaner if we could use it.

There seems to be a memory leak in the way the VAPI call is being made. When I hammer the VAPI with 100s of requests per second (by setting SFLOW_TEST_HAMMER_VAPI) the process grows fairly quickly. Not sure why.

sflow commented 3 weeks ago

Tried only creating one VAPI context and keeping it open - still leaks. Tried only connection to the VAPI once and never disconnecting - still leaks. I see messages like this from time to time: 0: vl_msg_api_alloc_internal:123: garbage collect pool 1 ring 0 index 91 0: vl_msg_api_alloc_internal:129: msg id 26 name (nil) So I guess there must be something I missed. Perhaps I'll ask on the vpp-dev mailing list.

sflow commented 3 weeks ago

Looks like the effect is just something to do with the way I am forking the thread. Nothing to do with VAPI at all.

sflow commented 3 weeks ago

Added missing pthread_detach() call. Now memory footprint appears to be stable. I'll leave this issue open, however, in case there is a whole different approach that we should look at.