transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

Hard coded copy options cause sync inconsistencies #45

Open mlavoie-sm360 opened 4 years ago

mlavoie-sm360 commented 4 years ago

In our source tables, we have some columns which contain raw HTML. These columns contain among other things, newline characters (\n).

When fast syncing the tables, the data is replicated correctly, that is, the newline characters are still valid.

When "normally" syncing the tables, the data replicated strips the slash () but not the 'n' due to some different flags being used. https://github.com/transferwise/pipelinewise-target-redshift/blob/0b78d1a485534d4f1d2bd5a623382660d2087452/target_redshift/db_sync.py#L425

The behaviour should be the same between fastsync and "normal" sync. Ideally, in both of these cases, the newline characters would still be valid.

Fastsync does not use the REMOVEQUOTES and ESCAPE flags.

Proposition

Perhaps, it would be better to include these hard coded options in the config_options variable? https://github.com/transferwise/pipelinewise-target-redshift/blob/0b78d1a485534d4f1d2bd5a623382660d2087452/target_redshift/db_sync.py#L408

aaronsteers commented 3 years ago

@mlavoie-sm360 - A Meltano user reported a similar issue today in Meltano Slack here and we found this logged issue. I also did some sleuthing and found you have created a fork which may have improved or resolved the issue in this commit. Can you report back to us if this fixed the newlines issue?

Thank so much!

mlavoie-sm360 commented 3 years ago

@aaronsteers, unfortunately this did not fix the issue for us. I don't remember the details entirely, but while reviewing the issue with a colleague this morning (to jog my memory), he pointed out that the real problem had been located a bit deeper in the code, somewhere around here: https://github.com/transferwise/pipelinewise-target-redshift/blob/80796b6c6e4647a85ddc9492e343547a027258e4/target_redshift/db_sync.py#L369

I forget the details, but the gist is that this is still an issue.

aaronsteers commented 3 years ago

@mlavoie-sm360 - Got it - Thanks for confirming!