transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

Allow the parallelism of stream loading to be defined in the configuration explicitly #20

Closed Limess closed 5 years ago

Limess commented 5 years ago

We noticed that this target is using all the available VCPUs (between 8 and 32) on the host when running in a Docker container, despite allocating 1 CPU via Docker.

It looks like the library used under the hood joblib should handle this and limit the parallelism correctly (https://github.com/joblib/joblib/blob/8fb6eb2260945ab692b34c2e3494be305f19ec58/joblib/externals/loky/backend/context.py#L104-L153) however this was not the case in our testing. Even setting LOKY_MAX_CPU_COUNT did not seem to limit the parallelism to the intended value.

This adds a 'parallelism' flag indicating the number of streams which will be processed in parallel to work around this issue, and make it more transparent to the user if they wish to define a fixed parallelism.

koszti commented 5 years ago

this completely makes sense, thanks for contributing.