snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
15 stars 7 forks source link

Add configuration settings for PubSub source #380

Closed colmsnowplow closed 3 days ago

colmsnowplow commented 6 days ago

Adds configuration options to allow us to tune PubSub source better:

max_outstanding_messages sets the upper limit on maximum messages processed at once (each of which spawns a goroutine) max_outstanding_bytes sets the upper limit on amount of bytes in the queue of messages waiting to be processed min_extension_period_seconds configures the minimum ack deadline extension to be set when reading messages from the subscription streaming_pull_goroutines configures the number of concurrent streaming pulls open grpc_connection_pool_size configures the connection pool size for the GRPC connection used to communicate with the subscription

Previously, the setting that streaming_pull_goroutines now controls was set by the concurrent_writes setting. However this is confusing behaviour as it behaves very differently to the other sources. The max_outstanding_messages is more like what concurrent_writes does for other sources, but the performance profile is wildly different, to the point that this source's configuration is very different from the others.

We want to make a release to resolve issues with pubsub, and it's preferable to avoid a breaking change in doing so. To achieve this, the logic is as follows:

Where streaming_pull_goroutines is configured, it takes precedence and concurrent_writes is ignored.

Where streaming_pull_goroutines is not configured, a warning is logged and we take the value of concurrent_writes

Where neither of those are configured, we log a warning and default to the previous default behaviour - which is an inadvisable 50.