Our snowplow stream-processing apps tend to produce duplicates on AWS. It happens because KCL allows workers to "steal" shard leases from other workers.
By adjusting some KCL configuration params, we can minimize lease stealing under some circumstances:
workerIdentifier: If a pod uses a consistent name, then whenever the pod restarts (e.g. after crashing or after a rollout) then the pod can re-claim leases that it previously owned before the restart
leaseDuration: The KCL default is 10 seconds. If we increase this, then we can allow a pod longer to re-claim its old leases after a restart.
Our snowplow stream-processing apps tend to produce duplicates on AWS. It happens because KCL allows workers to "steal" shard leases from other workers.
By adjusting some KCL configuration params, we can minimize lease stealing under some circumstances:
workerIdentifier
: If a pod uses a consistent name, then whenever the pod restarts (e.g. after crashing or after a rollout) then the pod can re-claim leases that it previously owned before the restartleaseDuration
: The KCL default is 10 seconds. If we increase this, then we can allow a pod longer to re-claim its old leases after a restart.