Allow the parallelism of stream loading to be defined in the configuration explicitly

We noticed that this target is using all the available VCPUs (between 8 and 32) on the host when running in a Docker container, despite allocating 1 CPU via Docker.

It looks like the library used under the hood joblib should handle this and limit the parallelism correctly (https://github.com/joblib/joblib/blob/8fb6eb2260945ab692b34c2e3494be305f19ec58/joblib/externals/loky/backend/context.py#L104-L153) however this was not the case in our testing. Even setting LOKY_MAX_CPU_COUNT did not seem to limit the parallelism to the intended value.

This adds a 'parallelism' flag indicating the number of streams which will be processed in parallel to work around this issue, and make it more transparent to the user if they wish to define a fixed parallelism.

transferwise / pipelinewise-target-redshift

Allow the parallelism of stream loading to be defined in the configuration explicitly #20