meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
100 stars 70 forks source link

Add support for sharded state bookmarks for substreams (ex. substreams by parent key) #22

Closed MeltyBot closed 2 years ago

MeltyBot commented 3 years ago

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/22

Originally created by @aaronsteers on 2021-01-14 20:02:25


On #20 and in discussions on !1, we have uncovered the need to work out additional spec work on complex bookmarking requirements - specifically bookmarks which are tracking distincts "shards" or "subdomains" of the total domain of entities in the stream.

A common scenario, for instance, would be keeping a distinct bookmark for each "project" in the GitLab tap, since virtually all streams must be keyed off of a project_id. For instance, the nested bookmark structure is needed to track the latest replication key for each substream, since each was run at a slightly different time, and each may need to be retried separately from one another if a failure affects one substream and not the others.

MeltyBot commented 2 years ago

View 14 previous comments from the original issue on GitLab