mojodna / osm-pds-pipelines

OSM PDS pipeline
https://quay.io/repository/mojodna/osm-pds-pipelines
ISC License
32 stars 4 forks source link

Backed up streams / Timeouts #5

Open kamicut opened 6 years ago

kamicut commented 6 years ago

What should the behavior in the following scenario:

  1. Lambda function errors
  2. Stream backs up such that it takes x time to catch up, _x > max_lambda_executiontime
  3. Lambda function attempts to catch up but times out

Should the data be declared lost and the checkpoint reset?

cc @mojodna

mojodna commented 6 years ago

Ideally it can get through at least a minute during each invocation, so the checkpoint should be able to progress, albeit slowly. (I would rather re-publish changes for "at least once delivery" instead of dropping them.)

One snag though is that the checkpointing occurs after the source data has been pushed onto the stream, which isn't a guarantee that it was consumed (and published), so maybe we need to think about that too.