prebid / prebid-server

Open-source solution for running real-time advertising auctions in the cloud.
https://prebid.org/product-suite/prebid-server/
Apache License 2.0
433 stars 739 forks source link

PBS spawns unlimited amount of goroutine #3754

Open linux019 opened 5 months ago

linux019 commented 5 months ago

On high RPS to /openrtb2/auction endpoint PBS spawns more and more goroutines. To handle the huge amount of traffic I had to put an RPS limit on the load balancer. During normal work of PBS amount of goroutines is ~5K image

PBS spaws a goroutine:

bretg commented 4 months ago

@linux019 - what library for goroutines pools are you proposing?

@zhongshixi has offered to provide a pointer to a potential solution.

@bsardo will coordinate a decision.

SyntaxNode commented 4 months ago

I had to put an RPS limit on the load balancer.

IMHO its good practice to use backpressure limiting layers in front of Prebid Server. This is the approach we use to avoid the situation described here.

I have no issue adding a goroutine limiting feature to PBS provided it doesn't add latency when unused due to either disabled (if we want to provide that option) or using a high limit value.

Slind14 commented 4 months ago

I think this is more about reducing the compute spent on mcall at normal usage.

Being able to deal better with traffic spikes would be a side effect.

There should be no need to add a library for this.

linux019 commented 4 months ago

@bretg we don't need third party library, many of them are over complicated. There is a good implementation https://github.com/panjf2000/ants it can be taken as example

zhongshixi commented 4 months ago

we use https://github.com/panjf2000/ants

it works very well in our system since it preallocate the resources for go routines you need. Some improvement we did

  1. we have different ants pool in different parts of the system to make sure not all concurrent execution compete on the same pool.
  2. you do not want to shoot your own foot by having strict limit on the number of go routines, you need to have a soft limit and have a capacity to allow it to grow otherwise your execution can be stuck waiting for a go routine to be available.