kestra-io / plugin-gcp

Apache License 2.0
9 stars 10 forks source link

Task for creating and deleting the dataproc cluster from Kestra #386

Closed Geoffrey-Schuette-Simpli-Fi closed 3 months ago

Geoffrey-Schuette-Simpli-Fi commented 6 months ago

Feature description

I have an airflow pipeline which creates a dataproc cluster using this airflow operator, submits a spark job, and then deletes the cluster with a similar operator as creating. Could I get this type of behavior added as kestra tasks?

In the slack space, a great suggestion was given to just use the GCP CLI which can help me try to recreate this pipeline, but for wider adoption, a task like the airflow operator would make it easier.

anna-geller commented 3 months ago

@iNikitaGricenko would be cool if you can help to add the plugin using the GCP SDK

From CLI, creating such a cluster is fairly simple, we need similar attributes in a dedicated plugin:

gcloud dataproc clusters create YOUR_CLUSTER_NAME \
    --region YOUR_REGION \
    --zone YOUR_ZONE \
    --master-machine-type n1-standard-2 \
    --worker-machine-type n1-standard-2 \
    --num-workers 2 \
    --bucket YOUR_BUCKET_NAME