iron-io / functions

IronFunctions - the serverless microservices platform by
https://iron.io
Apache License 2.0
3.18k stars 227 forks source link

Load balancer spreads calls across too many nodes #570

Open jconning opened 7 years ago

jconning commented 7 years ago

I have created a function (jconning/primes:0.0.1) that calculates a configurable number of prime numbers. I use this function to consume a variable amount of CPU. This is how to invoke the function: http://host:8080/r/primesapp/primes?max=1000000&loops=1 where max indicates the highest prime number to calculate and loops indicates the number of times to repeat the primes calculation.

I have created ten functions nodes and put them behind a single fnlb node. I have written a test harness to call fnlb to execute a single route (pointing to the primes function) ten times.

I expect the load balancer to concentrate all ten calls to two of the ten nodes. Sometimes it does this, but sometimes it spreads the calls across more than two nodes. The bug is that it spreads the calls across more than two nodes some of the time.

PR for the test harness and function (so you can run and examine them yourself) are coming soon!

Here is the output from one run of the test harness runs showing this behavior. The test harness calls the load balancer in batches of ten, three different times. The first time it calls a quick function, the second time a longer running function, and the third time an even longer running function. Note how the load balancer concentrates two of the three batches against just two nodes but in the remaining run spreads the calls amongst six nodes.

Jims-MacBook-Pro:lbtest jimc$ go run main.go
Discovering container ids for every node (use Docker's HOSTNAME env var as a container id)...
 54.175.27.185:8080 a0a9784ebd2b
 54.175.27.185:8081 98ce598c4359
 54.175.27.185:8082 e1d0903cb040
 54.175.27.185:8083 73fb13e4476b
 54.175.27.185:8084 c9a8c48476c6
 54.175.27.185:8085 2211e69e4922
 54.175.27.185:8086 a56ac820919f
 54.175.27.185:8087 c88470355491
 54.175.27.185:8088 de9ce5b86d5e
 54.175.27.185:8089 5dffdef038fc
Quick function: generate primes up to 1000
Calling a single route 10 times (through the load balancer)...
Results (executions per node):
 54.175.27.185:8087 1
 54.175.27.185:8085 2
 54.175.27.185:8084 1
 54.175.27.185:8089 1
 54.175.27.185:8088 4
 54.175.27.185:8081 1
Longer function: generate primes up to 1M
Calling a single route 10 times (through the load balancer)...
Results (executions per node):
 54.175.27.185:8088 5
 54.175.27.185:8085 5
Even longer function: repeat primes calculation 100 times (primes <= 1M)
Calling a single route 10 times (through the load balancer)...
Results (executions per node):
 54.175.27.185:8085 4
 54.175.27.185:8088 6
jconning commented 7 years ago

Here are results for the same configuration but with 1000 calls to a single route rather than 10 calls. 55-70% of the calls get routed to two nodes, with the remainder spread out across the remaining eight nodes.

Jims-MacBook-Pro:lbtest jimc$ go run main.go
Discovering container ids for every node (use Docker's HOSTNAME env var as a container id)...
  54.175.27.185:8080 a0a9784ebd2b
  54.175.27.185:8081 98ce598c4359
  54.175.27.185:8082 e1d0903cb040
  54.175.27.185:8083 73fb13e4476b
  54.175.27.185:8084 c9a8c48476c6
  54.175.27.185:8085 2211e69e4922
  54.175.27.185:8086 a56ac820919f
  54.175.27.185:8087 c88470355491
  54.175.27.185:8088 de9ce5b86d5e
  54.175.27.185:8089 5dffdef038fc
Quick function: generate primes up to 1000
Calling a single route 1000 times (through the load balancer)...
Results (executions per node):
  54.175.27.185:8082 32
  54.175.27.185:8083 26
  54.175.27.185:8086 31
  54.175.27.185:8084 40
  54.175.27.185:8087 32
  54.175.27.185:8085 341
  54.175.27.185:8080 41
  54.175.27.185:8089 58
  54.175.27.185:8081 24
  54.175.27.185:8088 370
Longer function: generate primes up to 1M
Calling a single route 1000 times (through the load balancer)...
Results (executions per node):
  54.175.27.185:8082 52
  54.175.27.185:8081 51
  54.175.27.185:8083 37
  54.175.27.185:8084 63
  54.175.27.185:8088 309
  54.175.27.185:8087 38
  54.175.27.185:8086 30
  54.175.27.185:8089 75
  54.175.27.185:8080 57
  54.175.27.185:8085 280
Even longer function: repeat primes calculation 100 times (primes <= 1M)
Calling a single route 1000 times (through the load balancer)...
Results (executions per node):
  54.175.27.185:8084 72
  54.175.27.185:8082 49
  54.175.27.185:8087 45
  54.175.27.185:8086 45
  54.175.27.185:8080 65
  54.175.27.185:8083 44
  54.175.27.185:8088 292
  54.175.27.185:8085 255
  54.175.27.185:8081 54
  54.175.27.185:8089 79
treeder commented 7 years ago

fnlb needs a lot of work still... Saw your test harness for it, nice work. #573