Closed vaygr closed 1 week ago
Hi @vaygr,
since you seems to have some sort of monitoring in place, can you share how graphs looks like for go_memstats_heap_alloc_bytes
and go_memstats_heap_sys_bytes
?
I'm running collector for long time and never seems to hit any sort of memory leak (I don't have huge traffic either, so I never push it too hard).
I noticed that memory usage stabilize at some value that depends on 2 factors, number of active flows and value of flush_interval
setting, here is example of mine:
Higher the value value of flush_interval
, more active flows will be kept in memory.
Can you share graph of count(netflow_flow_traffic_detail{})
metric? It should have cycles with length depending on flush_interval
, example on mine:
In my case, I have flush_interval
=36000
, which is pretty high, but ok for low-volume traffic.
Here you go:
The second graph's range is 1 week:
@rkosegi any idea what could be causing this? steps to debug further?
@rkosegi any idea what could be causing this? steps to debug further?
I'm not able to replicate your problem using provided config or some variation of it. New version of collector (v1.0.3) has just been released, so please give it a try first.
It's been running for a few hours, and I honestly don't see any change in behavior. Maybe you could try the docker image I'm using and see if you can reproduce this with it? vaygr/netflow-collector:1.0.3
Other than that here's the graph with similar metrics of what you posted earlier, and I see memory almost never gets released:
It's 1 week of running the collector and it got to over 1GB.
So here's the last 48 hours of running 1.0.3.
I also attached anonymized metrics sample: metrics-anon.txt
after looking at attached metrics, it's clear that collector itself doesn't pollute heap so much, when compared to goflow's built-in metrics (flow_traffic_XXX
). Looks like https://github.com/cloudflare/goflow/issues/94. Unfortunately, I don't see how this can be disabled programatically
@rkosegi it's unclear to me why you can't reproduce it though and memory doesn't get reclaimed in time. Is it because of the different sender (in my case OPNsense's flowd
)?
yeah, if you check linked issue above, there is similar behavior explained using pmacctd - which doesn't happen in my case
I see. I guess one option could be switching over to https://github.com/netsampler/goflow2 given goflow project stagnation.
For me the workaround could be either applying https://github.com/cloudflare/goflow/pull/95 during the build or limiting RAM resources at the container level.
Moving to goflow2 seems like logical choice, but I can't promise any ETA. Pull requests are always welcome btw
I tested both container limits and scheduled restarts -- both work well. Thanks a lot for your help in troubleshooting this.
At this point I'll leave the decision / ETA for migration to goflow2 up to you, but yes, as you noted, it could be a better supported backend, so might make sense to migrate as soon as there are resources.
Hello, recently I realized that
netflow-collector
container has been eating memory uncontrollably.I'm running the latest 1.0.2 tag with this config:
One thing I noticed this could be due to
flow_traffic_bytes
,flow_traffic_packets
,flow_traffic_summary_size_bytes
,flow_traffic_summary_size_bytes_sum
,flow_traffic_summary_size_bytes_count
metrics coming from the goflow package that grow for every port, but are practically useless. They result in over 20MB of payload for each scrape. I'm curious if it would be possible to turn them off or what other issue could be. Maximum on the picture above is roughly 2GB of memory.