transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

_sdc_deleted_at is a varchar column rather than a timestamp #37

Open Limess opened 4 years ago

Limess commented 4 years ago

The column _sdc_deleted_at does not match _sdc_batched_at and _sdc_extracted_at which are both timestamps, instead it is character varying.

This is an easy fix - is there any reason for this discrepency or is it an oversight?

Suggestion: update _sdc_deleted_at to be consistent with other _sdc columns

koszti commented 4 years ago

I remember we wanted to make this for target-snowflake a few months ago to collect a low hanging fruit but we found an unexpected difficulty.

One or more of the supported target components send the _sdc_deleted_at values as non timestamps so we'd need to update these connectors first and we haven't addressed this yet. Unfortunately I can't remember ATM which taps are they - also possible that they've been fixed in the meantime.

Another interesting question how to roll out this change to a prod system once it's implemented? The new column type will be detected as a Schema Change and a new version of the column will be added to the table _SDC_DELETED_AT_{DATE}. I'm not sure how this kind of change would affect systems with hundreds or thousands of replicated tables.