moj-analytical-services / etl_manager

A python package to create a database on the platform using our moj data warehousing framework
21 stars 8 forks source link

Add workerType #152

Closed samnlindsay closed 1 week ago

samnlindsay commented 2 weeks ago

https://docs.aws.amazon.com/glue/latest/webapi/API_CreateJob.html#Glue-CreateJob-request-WorkerType

Enabling different worker types to make more cost-effective use of resources (i.e. doubling DPUs per worker rather than doubling the number of workers).

https://aws.amazon.com/blogs/big-data/scale-your-aws-glue-for-apache-spark-jobs-with-new-larger-worker-types-g-4x-and-g-8x/

AllocatedCapacity is now deprecated, so is replaced by NumberOfWorkers and WorkerType (i.e. AllocatedCapacity = 10 DPUs -> 5 x "G.2X" / 10 x "G.1X" workers )

Thomas-Hirsch commented 2 weeks ago

semver would suggest changing the major version gvien it's a breaking change. As mentioned on slack, it is possible to do this without needing to make this change, though I admit this is a bit of a workaround.

samnlindsay commented 2 weeks ago

could I trouble you to update the changelog too?

I knew that was coming 😂