snowplow-incubator / common-streams

Other
1 stars 0 forks source link

PubSub source scale the parallel pull count with number of cores #78

Closed istreeter closed 3 months ago

istreeter commented 3 months ago

The PubSub Source parameter parallelPullCount was used to set the parallelism of the underlying Subscriber. With a higher pull count, the Subscriber can supply events more quickly to the downstream of the application, but there is more overhead.

For typical Snowplow apps, a pull count of 1 is sufficient on small instances. But when there is more cpu availalbe, the downstream app processes events more quickly, and therefore we need a higher pull count to provide the events.

This PR makes it so pull count is picked dynamically based on available cpu. Snowplow apps on bigger instances will automatically get the benefit of this change, without requiring the user to explicitly set the pull count.