meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
100 stars 70 forks source link

Add support for ACTIVATE_VERSION message types #18

Open MeltyBot opened 3 years ago

MeltyBot commented 3 years ago

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/18

Originally created by @aaronsteers on 2021-01-06 23:04:29


From the singer-python library:

ACTIVATE_VERSION message (EXPERIMENTAL).

The ACTIVATE_VERSION messages has these fields:
  * stream - The name of the stream.
  * version - The version number to activate.

This is a signal to the Target that it should delete all previously
seen data and replace it with all the RECORDs it has seen where the
record's version matches this version number.

Note that this feature is experimental. Most Taps and Targets should
not need to use the "version" field of "RECORD" messages or the
"ACTIVATE_VERSION" message at all.

msg = singer.ActivateVersionMessage(
    stream='users',
    version=2)

Implementing for taps:

I think we can safely implement for taps and send the message by default. For cases where targets cannot tolerate the unknown message types, we should support a disable_activate_version_messages=True option.

When FULL_TABLE replication is selected in the tap:

  1. [ ] Initialize a version number (likely an epoch-based integer): https://github.com/transferwise/pipelinewise-tap-snowflake/blob/aa89f2e4235999dbeafc7406a7f8b382542d8d5b/tap_snowflake/sync_strategies/common.py#L33
  2. [ ] Include version as property within emitted RECORD messages. https://github.com/transferwise/pipelinewise-tap-snowflake/blob/aa89f2e4235999dbeafc7406a7f8b382542d8d5b/tap_snowflake/sync_strategies/common.py#L200
  3. [ ] Emit ACTIVATE_VERSION at the beginning of the first FULL_TABLE sync operation: https://github.com/transferwise/pipelinewise-tap-snowflake/blob/aa89f2e4235999dbeafc7406a7f8b382542d8d5b/tap_snowflake/sync_strategies/full_table.py#L87-L95
  4. [ ] Emit ACTIVATE_VERSION after a successful FULL_TABLE sync: https://github.com/transferwise/pipelinewise-tap-snowflake/blob/aa89f2e4235999dbeafc7406a7f8b382542d8d5b/tap_snowflake/sync_strategies/full_table.py#L114
MeltyBot commented 2 years ago

View 5 previous comments from the original issue on GitLab

aaronsteers commented 2 years ago

On revisiting this issue, I found we already have handling available for targets and sinks in the SDK:

https://github.com/meltano/sdk/blob/621f1e2f00cd5d2a21a3abbb969936f27ae70a65/singer_sdk/target_base.py#L396-L404

aaronsteers commented 2 years ago

I just closed #607 as stale. I did not have bandwidth at the time to get everything polished/tested. But future developers could definitely use this as a starting point.

TyShkan commented 1 year ago

What is the current state of ACTIVATE_VERSION message type for taps/targets?