Open mlavoie-sm360 opened 4 years ago
@mlavoie-sm360 - A Meltano user reported a similar issue today in Meltano Slack here and we found this logged issue. I also did some sleuthing and found you have created a fork which may have improved or resolved the issue in this commit. Can you report back to us if this fixed the newlines issue?
Thank so much!
@aaronsteers, unfortunately this did not fix the issue for us. I don't remember the details entirely, but while reviewing the issue with a colleague this morning (to jog my memory), he pointed out that the real problem had been located a bit deeper in the code, somewhere around here: https://github.com/transferwise/pipelinewise-target-redshift/blob/80796b6c6e4647a85ddc9492e343547a027258e4/target_redshift/db_sync.py#L369
I forget the details, but the gist is that this is still an issue.
@mlavoie-sm360 - Got it - Thanks for confirming!
In our source tables, we have some columns which contain raw HTML. These columns contain among other things, newline characters (\n).
When fast syncing the tables, the data is replicated correctly, that is, the newline characters are still valid.
When "normally" syncing the tables, the data replicated strips the slash () but not the 'n' due to some different flags being used. https://github.com/transferwise/pipelinewise-target-redshift/blob/0b78d1a485534d4f1d2bd5a623382660d2087452/target_redshift/db_sync.py#L425
The behaviour should be the same between fastsync and "normal" sync. Ideally, in both of these cases, the newline characters would still be valid.
Fastsync does not use the REMOVEQUOTES and ESCAPE flags.
Proposition
Perhaps, it would be better to include these hard coded options in the config_options variable? https://github.com/transferwise/pipelinewise-target-redshift/blob/0b78d1a485534d4f1d2bd5a623382660d2087452/target_redshift/db_sync.py#L408