uber-archive / hyperbahn

Service discovery and routing for large scale microservice operations
MIT License
396 stars 57 forks source link

Mitigate 100% CPU edge case under ringpop churn #305

Closed Raynos closed 7 years ago

Raynos commented 8 years ago

When the ring churns we get a large amount of ringpop ring changes.

This causes updateServiceChannels() to get called at a high frequency.

This method is incredibly CPU intensive.

To avoid high CPU usage under high frequency ringpop changes we should cap the maximum number of updateServiceChannels() calls per minute.

By doing so we add a small amount of staleness but that's a far better trade off then being unavailable and burning a lot of CPU.

Right here ( https://github.com/uber/hyperbahn/blob/master/service-proxy.js#L282-L284 ) instead of scheduling a updateServiceChannels() immediately we should schedule one in the future ( 5 seconds or something ? ) and check to see how many times we've updated in the last minute, maybe have a cap of 12 updates per minute ?

blampe commented 7 years ago

Resolved by #307?

Raynos commented 7 years ago

Correct.