kestra-io / kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
https://kestra.io
Apache License 2.0
7.64k stars 466 forks source link

Add a GCP Dataform subplugin #2303

Open anna-geller opened 11 months ago

anna-geller commented 11 months ago

Problem

We already support the open-source edition of Dataform: https://github.com/kestra-io/plugin-dataform

However, our users requested the ability to trigger Dataform jobs running on GCP Dataform service https://cloud.google.com/dataform?hl=en

API

The OSS version was implemented as a Node.js-CLI plugin. However, the GCP-specific plugin will likely only need to talk to GCP Dataform service via the REST API https://cloud.google.com/dataform/reference/rest

Specifically, the workflow invocation seems like the right endpoint https://cloud.google.com/dataform/reference/rest#rest-resource:-v1beta1.projects.locations.repositories.workflowinvocations

Possible syntax

id: dataform
namespace: dev
tasks:
    - id: transform
      type: io.kestra.plugin.gcp.dataform.InvokeWorkflow
      wait: true # wait for results by default so that if that job fails, this task fails as well
      # other properties from this request body https://cloud.google.com/dataform/reference/rest/v1beta1/projects.locations.repositories.workflowInvocations#WorkflowInvocation 

ideally, we should combine this with the list/get/query endpoints to allow polling for workflow invocation's results (wait: true)

drelum commented 11 months ago

Support for GCP Dataform service would be very useful.

anna-geller commented 10 months ago

for now done, we'll keep the issue open only to add GCP implementation