yyyar / gobetween

:cloud: Modern & minimalistic load balancer for the Сloud era
http://gobetween.io
Other
1.91k stars 210 forks source link

High CPU load when adding a second UDP server #297

Closed HernanSzel closed 3 years ago

HernanSzel commented 3 years ago

Hi!

We've been using gobetween without a hitch for almost two month in our production env for balancing one udp port. Last friday we added another udp bind to out config and our CPU load went to 100% almost instantly. Is this behavior expected? We need to balance at least 15 udp ports, is this possible?

This is our gobetween instance:

imagen

Our config:


[servers.elkfwsu]
 bind = "0.0.0.0:9090"
 protocol = "udp"
 balance = "roundrobin"
 backend_idle_timeout="0"
 client_idle_timeout="0"

[servers.elkfwsu.udp]
 max_requests = 1
 max_responses = 0

[servers.elkfwsu.discovery]
 kind = "static"
 static_list = [
  "as-logstash-00:9090",
  "as-logstash-01:9090"
 ]

[servers.elkfwsu.healthcheck]
 interval = "30s"
 kind = "exec"
 exec_command = "./hc.sh"
 exec_expected_positive_output = "1"
 exec_expected_negative_output = "0"
 timeout = "5s"

[servers.elksolutions]
 bind = "0.0.0.0:9106"
 protocol = "udp"
 balance = "roundrobin"
 backend_idle_timeout="0"
 client_idle_timeout="0"
 max_responses = 0

[servers.elksolutions.udp]
 max_requests = 1
 max_responses = 0

[servers.elksolutions.discovery]
 kind = "static"
 static_list = [
  "as-logstash-00:9106",
  "as-logstash-01:9106"
 ]

[servers.elksolutions.healthcheck]
 interval = "30s"
 kind = "exec"
 exec_command = "./hc.sh"
 exec_expected_positive_output = "1"
 exec_expected_negative_output = "0"
 timeout = "5s"

Cheers!

illarion commented 3 years ago

Hi @HernanSzel !

Do you have traffic stats for one and another port? I mean number of packets per second and traffic. Maybe cpu burst is just the huge increase of traffic to process?

HernanSzel commented 3 years ago

Hi @illarion

Thank you for your quick response! Let me see what i can find. We are using gobetween to balance between two logstash instances in our elk stack. We have two pipelines setted up, the older one is ingesting a lot more events than the newer one:

Old pipeline:

` {

"active_connections": 0,
"rx_total": 0,
"tx_total": 165596399298,
"rx_second": 0,
"tx_second": 797051,
"backends": [
    {
        "host": "as-logstash-01",
        "port": "9090",
        "priority": 1,
        "weight": 1,
        "stats": {
            "live": true,
            "discovered": true,
            "total_connections": 0,
            "active_connections": 0,
            "refused_connections": 0,
            "rx": 0,
            "tx": 82793184421,
            "rx_second": 0,
            "tx_second": 471160
        }
    },
    {
        "host": "as-logstash-00",
        "port": "9090",
        "priority": 1,
        "weight": 1,
        "stats": {
            "live": true,
            "discovered": true,
            "total_connections": 0,
            "active_connections": 0,
            "refused_connections": 0,
            "rx": 0,
            "tx": 82801632764,
            "rx_second": 0,
            "tx_second": 490876
        }
    }
]

} `

Newer pipeline:

` {

"active_connections": 0,
"rx_total": 0,
"tx_total": 390558650,
"rx_second": 0,
"tx_second": 1232,
"backends": [
    {
        "host": "as-logstash-00",
        "port": "9106",
        "priority": 1,
        "weight": 1,
        "stats": {
            "live": true,
            "discovered": true,
            "total_connections": 0,
            "active_connections": 0,
            "refused_connections": 0,
            "rx": 0,
            "tx": 195497593,
            "rx_second": 0,
            "tx_second": 121
        }
    },
    {
        "host": "as-logstash-01",
        "port": "9106",
        "priority": 1,
        "weight": 1,
        "stats": {
            "live": true,
            "discovered": true,
            "total_connections": 0,
            "active_connections": 0,
            "refused_connections": 0,
            "rx": 0,
            "tx": 195058592,
            "rx_second": 0,
            "tx_second": 224
        }
    }
]

} `

Hope this helps!

HernanSzel commented 3 years ago

Update: we realize that we were using an out of date build of gobetween that didn't include the following fix: [https://github.com/yyyar/gobetween/issues/290]()!!

Now its working as desired. Thanks!

illarion commented 3 years ago

@HernanSzel thanks for using Gobetween!