sflow / vpp-sflow

sFlow plugin for VPP
Apache License 2.0
0 stars 0 forks source link

Unexplained behavior when sampling processing callback not polled frequently enough #6

Open sflow opened 1 week ago

sflow commented 1 week ago

Registering the sample-processing (and counter-polling) loop as a main-thread node rather than a separate pthread seems to be working, but while testing it I experimented with only having it called every second instead of every millisecond, and only reading one packet from each FIFO. The equivalent of changing these sflow.h parameters:

SFLOW_POLL_WAIT_S 0.001 SFLOW_READ_BATCH 100

to:

SFLOW_POLL_WAIT_S 1.0 SFLOW_READ_BATCH 1

Expected behavior: I expected most samples to be dropped in the worker thread because they could not be enqueued to the FIFO, and I expected the sample-processing loop to only send a limited number of samples to PSAMPLE (one for each worker, at 1-second intervals).

Observed behavior: I saw 4 or 5 samples delivered to PSAMPLE the first time, then nothing more came out (!)

Flagging this as an issue in case it only comes up when these very poor settings are tested. Everything seemed OK with the default parameters. Debugging in gdb seemed to perturb the behavior, so it may be necessary to instrument with counters. Curious to see if the same thing happens under high load.

sflow commented 4 days ago

If the FIFO from the data-plane reaches capacity we seem to lose sync and subsequent dequeue-reads from the main thread can get garbage almost every time. Oddly this doesn't happen if we start with something like 1-in-1000 and then change it to 1-in-1. It only happens if we start the workers with an aggressive sampling rate. So I probably just misunderstood something about the VPP svm code. Maybe there is a race if it is still allocating space?

Might have to try a home-grown FIFO to compare.... which is tempting anyway because we can take advantage of the fact that the messages can always be the same size.

sflow commented 4 days ago

Using home-grown FIFO seems to work and it's only 20 lines of code with no dynamic allocation. However I need to squint at it some more to make sure there are enough "volatile" and "atomic" keywords sprinkled around to make it work on all architectures. I hope that won't harm throughput.

Still do not know what went wrong with my use of svm_fifo, but investigating that does not seem urgent now.