Closed Limess closed 5 years ago
Ran a quick test with COMPUPDATE on
based off a fork and there was no difference to the output of ANALYZE compression
for the test tables.
The original idea of adding COMPUPDATE OFF is based on a not too recent blog post at https://www.flydata.com/blog/how-to-improve-performance-upsert-amazon-redshift/ and maybe it's outdated.
At the moment I'm working on https://github.com/transferwise/pipelinewise-target-redshift/issues/8 that will give an option to overwrite the built-in COPY options. Do you think that would be helpful for you as well so you can add/remove extra options to COPY as you wish?
Thanks for the info and reference
I think https://github.com/transferwise/pipelinewise-target-redshift/issues/8 would be sufficient for further tuning or experimentation regarding this issue.
Reading more into this I think this makes complete sense for staging tables anyway, what we really want is the initial table sync to not use a staging table and do a direct copy with COMPUPDATE
unset. This is probably not a priority for this library at this time as the initial sync when using Pipelinewise directly is fast-sync.
Longer term we may look at adding this here or switching to using pipelinewise rather than this target independently.
The latest 1.0.7 version released to PyPI and gives the option to override the default COPY commands and to remove COMPUPDATE OFF
. You need to add the copy_option
key to the config.json
and add the values only that you specifically need. Doc is at https://github.com/transferwise/pipelinewise-target-redshift#configuration-settings
Like you said this library primarily used to load data to staging and leaving COMPUPDATE OFF
probably makes sense and COMPUPDATE
is unset in pipelinewise fast-sync by default.
General question as to why
COMPUPDATE
isoff
in the RedshiftCOPY
command.After running some tables from postgres to
pipelinewise-target-redshift
I've checked the result compression usingANALYZE_COMPRESSION
with this output:I'm unsure if enabling
COMPUPDATE
to apply compression on the staging table would then copy encodings across to the final table or whether they're discarded hence it being disabled here? Does enabling compression slow down loads?Is there another alternate way of applying compression to the final raw tables generated by this target?