transferwise / pipelinewise

Data Pipeline Framework using the singer.io spec
https://transferwise.github.io/pipelinewise
Apache License 2.0
641 stars 121 forks source link

add_metadata_columns flag not being recognized in either fastsync or singer sync #692

Open cap-itadmin opened 3 years ago

cap-itadmin commented 3 years ago

Describe the bug Pipelinewise is not respecting the settings in target yml file for add_metadata_columns= False

  1. All fastsync supported taps may need to be adjusted, per Peter Kosztolanyi. tap-mysql-fastsync adding metadata columns [(https://github.com/transferwise/pipelinewise/blob/ac926d7fe0322e8382850f4889fac85cd316384f/pipelinewise/fastsync/commons/tap_mysql.py#L261)] tap-postgres-fastsync adding metadata columns [https://github.com/transferwise/pipelinewise/blob/ac926d7fe0322e8382850f4889fac85cd316384f/pipelinewise/fastsync/commons/tap_postgres.py#L407]

  2. For a Snowflake target and Postgres tap running in Singer mode, the metadata columns are being back to the table, even when manually deleted from the Target table post initial sync.

To Reproduce Steps to reproduce the behavior:

  1. Target yml file includes lines:
    
    # ------------------------------------------------------------------------------
    # General Properties
    # ------------------------------------------------------------------------------
    id: "snowflake3"                          # Unique identifier of the target
    name: "Snowflake"                      # Name of the target
    type: "target-snowflake"             # !! THIS SHOULD NOT CHANGE !!

add_metadata_columns: False


2.  For an existing Postgres tap - Snowflake target pipeline, manually delete the three metadata columns from the target table
     alter table xyz drop column  _SDC_DELETED_AT,_sdc_extracted_at, _SDC_BATCHED_AT;

3.  Run the pipeline.

4. Check Snowflake table or log and see that columns have been added back in.

**Expected behavior**
I would expect the metadata columns not to be added when add_metadata_columns= False.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Your environment**
 - Running in Docker
 - Source:  Postgres 
 - Target:  Snowflake
 - Using Log Based Replication

**Additional context**
Link to discussion on Slack Singer Pipelinewise channel:  https://singer-io.slack.com/archives/CNL7DL597/p1617988746067000
Saadmairaj commented 3 years ago

Facing the same issue when syncing from s3 to Postgres as well. Found any solution or workaround to this issue?