Closed iconara closed 5 years ago
Hey first thanks for putting up this scripts makes my life easier trying to migrate some data from redshift, though still struggling with limitations atm. Did you manage to find a workaround for this issue in particular? The problem is that if data is not gzipped I hit the other issue with transfer limitations as it goes over 20 files easily... Probably need to think of some uncompression script on VM otherwise. Thanks
No, sorry, this error is unresolved. I have a workaround where I transfer the files compressed to GCS, and then spin up a 32 core GCP instance to decompress and re-upload the files to GCS, and the I run the BigQuery load.
I basically run this in a startup script:
gsutil ls "${source_prefix}*" | parallel "gsutil cp {} - | gzip -d | gsutil cp - ${destination_prefix}\$(basename {})"
The transfer limitation issue shouldn't be a problem anymore, it was fixed in #3. The limitation on 20 files is for how many prefixes are specified in the job, but I've changed BigShift to only specify one prefix now. I see that I haven't closed the issue though, so I'll go ahead and do that.
ok thanks for sharing the workaround will try it once I have to deal with it again. I wasn't sure if transfer had this problem, but if it doesn't then uncompressed option will be working.
When trying to load gzip'ed dumps into BigQuery I get this error:
Not compressing the files during the transfer is not an attractive option, it will cost too much. Is there some way to tell Redshift to produce smaller files?