Closed bpereto closed 9 months ago
Hi @bpereto, Thank you for the report. My hunch is you might be receiving more flows that you can process.
Some questions:
have you tried with commit https://github.com/netsampler/goflow2/commit/9f2b0806f0bdaf409fd682f4194df202922104ac
so far it seems that the affected function is:
which is expected to be the one allocating the memory but message should be put in the pool back at:
are you running it with custom garbage collection settings?
are you specifying a queue size? if not, could you run with -listen netflow://:3001/?queue_size=100000
: I'm expecting it to cap at 900MB RAM and start dropping UDP packets (or stop processing)
could you provide me with the list of goroutines? There could be something blocked at decoding and it keeps allocating into the dispatch
channel (default setting is 9GB -> 1000000 messages of 9000 bytes).
lastly: what are the machine specs? 20k flows per second is quite high for a single machine. It could be that the packets aren't processed fast enough (htop
should indicate if all the processors are used). If not, try also -listen netflow://:3001/?queue_size=100000&count=8
(or however many CPUs) and make sure GOMAXPROCS
environment variable is not set to a smaller amount of CPUs.
Per your screenshot, it took 345 sec to use 11GB of RAM => 20.18MB/s of allocations => which likely shows a channel growth of 2243 messages/sec. This would mean with a rate of 20k flows per second, there are only enough resources to process 17.7k flows per second. At this rate, a single CPU need to process a message in less than 50 microseconds (you can find this information with flow_decoding_time_seconds
Prometheus metric). Hopefully more workers and sockets should help.
Hopefully, this should allow me to isolate the issue.
Hi,
Thanks for your suggestions.
- have you tried with commit https://github.com/netsampler/goflow2/commit/9f2b0806f0bdaf409fd682f4194df202922104ac
I'm using https://github.com/netsampler/goflow2/commit/34a0c1618e6fcf0d509c4553b3236b4aa85c71ca (main) and should be included. Also the fix is in producer_sf.go
- i'm using netflow/ipfix one.
- are you running it with custom garbage collection settings?
No, no custom settings.
- Would you provide me with the list of goroutines? There could be something blocked at decoding and it keeps allocating into the dispatch channel (default setting is 9GB -> 1000000 messages of 9000 bytes).
How can I do that?
- are you specifying a queue size? if not, could you run with -listen netflow://:3001/?queue_size=100000: I'm expecting it to cap at 900MB RAM and start dropping UDP packets (or stop processing)
- what are the machine specs? 20k flows per second is quite high for a single machine.
With the v1 and same single instance (and multiple processes) I'm able to process 10x the amount. Indeed it's probably the missing CPU resources, as the following testing shows, and the used resources were quite at the limit during the previous testing.
-listen netflow://:3001/?count=4
with currently 40k f/s inputv1
flow_summary_decoding_time_us{name="NetFlow",quantile="0.5"} 299.284
flow_summary_decoding_time_us{name="NetFlow",quantile="0.9"} 387.87
flow_summary_decoding_time_us{name="NetFlow",quantile="0.99"} 862.666
v2 (https://github.com/netsampler/goflow2/commit/34a0c1618e6fcf0d509c4553b3236b4aa85c71ca)
goflow2_flow_decoding_time_seconds{name="netflow",quantile="0.5"} 0.00215827
goflow2_flow_decoding_time_seconds{name="netflow",quantile="0.9"} 0.004635852
goflow2_flow_decoding_time_seconds{name="netflow",quantile="0.99"} 0.006492566
the decoding time, despite different metric names and formats are similar or even faster with v2. I'm a bit surprised, that with v2, the CPU Load more than doubled - and therefore I should resize my setup.
Not to myself: do isolated testing with enough ressources. case closed then? Or if you have a hint how I can trace which routine causes the CPU increase, we may can improve the resources needed for processing.
ok, further testing with different outputs yields other results.
On a 16 vCPU Machine for goflow v2: -listen netflow://:3001/?count=4
with currently 40k f/s input:
-format bin
: 9.3%-10% vCPU-format pb
(protobuf, my addition): 8.6%-10% vCPUit's reasonable that the formatting of JSON https://github.com/netsampler/goflow2/blob/34a0c1618e6fcf0d509c4553b3236b4aa85c71ca/producer/proto/messages.go#L88 needs more compute ressources.
Hello,
Thank you for this info.
Could you try with count=16&queue_size=100000
? And with just queue size.
Some of the changes with V2 were done on the conversion to protobuf (producer module) using reflection but it may be more impactful than I initially thought. The tradeoff was allowing more custom fields to be mapped. A custom producer should solve it to do the mapping (in addition to the formatter) and compiling it.
In regards to listing the goroutines, you need to enable pprof (import _ "net/http/pprof"
) and fetch the debug page on /debug/pprof/goroutines
(should be on the same endpoint as metrics.
I would say its similar
(16 vCPU, 45k f/s)
-listen="netflow://:3001?count=4" -format bin
: 9.5%-12% vCPU-listen="netflow://:3001?count=16&queue_size=100000" -format bin
: 10.5%-12% vCPU-listen="netflow://:3001?queue_size=100000" -format bin
: 9.2%-10.2% vCPU@bpereto thanks how is the RAM profile? does it still OoM after 10min?
do the logs confirm count
and queue_size
with the entered values?
Hi,
I didn't record all the RAM profile during the measurements but it stays on average the same. With increased number of sockets, the UDP Memory is on average a little higher.
So no OOM after 10min - it is stable even after 1min.
Some of the changes with V2 were done on the conversion to protobuf (producer module) using reflection but it may be more impactful than I initially thought. The tradeoff was allowing more custom fields to be mapped.
the mapping is really useful - but works only in the json and text output (FormatMessageReflectCustom
) with said reflection.
I currently extended the protobuf schema to also include the mapped fields in binary, or in my case protobuf format.
for me the issue is resolved as the resource consumption is on average with the binary/protobuf output format the same with v1 and v2.
Hi @bpereto, Is this resolved?
The mapping should work with every output.
Example in the following mapping.yaml
.
formatter:
protobuf: # manual protobuf fields addition
- name: flow_direction
index: 42
type: varint
ipfix:
mapping:
- field: 61
destination: flow_direction
IPFIX fields identified by 61 will be serialized into a protobuf as a varint at index 42.
Your tools (not GoFlow2) decoding the binary format must use a custom flow.proto
with:
message FlowMessage {
uint64 flow_direction = 42;
Otherwise, those fields, without a type, will only be available with protoreflect as "unknown data". As deserializers need a type.
Yes, this is resolved.
thank you for the explanation of the mapping.
Describe the bug I'm testing the new v2 version of goflow and encounter a steady increase in memory consumption when running it with ca. 20k f/s until OOM-Killer invokes.
I encountered it with the normal file transport as well with a custom pulsar transport and protobuf format (not the binary one).
Included in this report is a heap dump indicating a memory issue in
github.com/netsampler/goflow2/v2/utils.glob..func1
using the console output with the default json format.UDP Buffers are stable and not overflowing, therefore I assume it is a problem with the
sync.Pool
handling.To Reproduce Steps to reproduce the behavior:
(note
-profile
is pprofile added)File: goflow2.v2 Type: inuse_space Time: Sep 27, 2023 at 1:58pm (CEST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top10 Showing nodes accounting for 5.06GB, 99.54% of 5.08GB total Dropped 36 nodes (cum <= 0.03GB) flat flat% sum% cum cum% 5.03GB 98.96% 98.96% 5.03GB 98.96% github.com/netsampler/goflow2/v2/utils.glob..func1 0.03GB 0.59% 99.54% 0.04GB 0.77% net.(UDPConn).ReadFromUDP (inline) 0 0% 99.54% 5.07GB 99.73% github.com/netsampler/goflow2/v2/utils.(UDPReceiver).receive 0 0% 99.54% 5.07GB 99.73% github.com/netsampler/goflow2/v2/utils.(UDPReceiver).receivers.func1 0 0% 99.54% 5.03GB 98.96% sync.(Pool).Get