netsampler / goflow2

High performance sFlow/IPFIX/NetFlow Collector
BSD 3-Clause "New" or "Revised" License
428 stars 99 forks source link

UDP Socket Memory Limit hit / Overflow #55

Closed bpereto closed 2 years ago

bpereto commented 2 years ago

I think there is some issue introduced in updating sarama to 1.30.0 or its deps (https://github.com/netsampler/goflow2/pull/47) as we see UDP Socket Memory limit hit / overflow.

The working commit is: 17a96d991149c9bcc1481795de1c19b718163bb3 The Overflow surfaces with: ec08b786c8ad2c532d76dd7686ee9d622e2540be

OK: netdata graph with commit 17a96d991149c9bcc1481795de1c19b718163bb3 Screenshot 2021-11-22 at 12 10 43

NOK: netdata graph with commit ec08b786c8ad2c532d76dd7686ee9d622e2540be Screenshot 2021-11-22 at 12 11 23

shyam334 commented 2 years ago

To isolate the fault, wondering if you have tried the commit/PR just before ec08b78, 7baa828. (where sarama was updated, but not go version)

bpereto commented 2 years ago

yes, I verified the commit https://github.com/netsampler/goflow2/commit/7baa828267606d945919ca58a6c72757be3f65f1 with the same result.

lspgn commented 2 years ago

Thank you for reporting this issue, are you able to provide more collected metrics (eg: Prometheus endpoint)? Does this surface when using a file transport? (assuming no, but would like to know before ruling this out) Are you able to get the count of messages produced into Kafka with both versions? Is it NetFlow/IPFIX or sFlow?

lspgn commented 2 years ago

Hello, Is this still happening?

bpereto commented 2 years ago

Hi, sorry for the delay, I will test again. The change is the update to sarama 1.30.1 in right? 05b436277b297c3170d6cf2f1689ee312ccf14bb

the release 1.30.1 of sarama has a warning, which I don't know if this applies to goflow? Screenshot 2022-01-25 at 15 39 00

regression for throughput drop: https://github.com/Shopify/sarama/issues/2071

I will test again with the newest commits of goflow.

bpereto commented 2 years ago

The performance issue is resolved with the latest commit of goflow (8d59905c4417c936edd160fe4b72aba08d20201d) and latest sarama release 1.31.0, probably a regression in sarama.

Thank you for reporting this issue, are you able to provide more collected metrics (eg: Prometheus endpoint)?

I think this is now obsolete.

Does this surface when using a file transport? (assuming no, but would like to know before ruling this out)

No, writing directly to a file (tmpfs) does not show a problem.

Is it NetFlow/IPFIX or sFlow?

It's ipfix (v10).

lspgn commented 2 years ago

Thank you for confirming. Will close the Issue.