kestra-io / plugin-git

Apache License 2.0
3 stars 3 forks source link

New Git tasks to better support moving between environments using Git Push and Sync for flows #56

Closed anna-geller closed 5 months ago

anna-geller commented 7 months ago

Feature description

Goals of the new tasks

  1. Make nested namespaces work as local folders (simple and intuitive pattern, easy to understand)
  2. Allow pushing only specific flow(s) to a specific Git directory.
  3. The system flow pattern will work thanks to https://github.com/kestra-io/kestra-ee/issues/1099. This way, the flow implementing Git Push can be located in the system namespace and can be used to push all flows regularly without any access issues.

Proposed solution

image

PushFlows task

type: "io.kestra.plugin.git.PushFlows"

Commit and push your saved flows to a Git repository.

Using this task, you can push one or more flows from a given namespace (and optionally also child namespaces) to Git. Check the examples below to see how you can push all flows or only specific ones. You can also learn about Git integration in the Version Control with Git documentation.

Examples

Automatically push all saved flows from the dev namespace and all child namespaces to a Git repository every day at 5 p.m. Before pushing to Git, the task will adjust the flow's source code to match the targetNamespace to prepare the Git branch for merging to the production namespace.

id: push_to_git
namespace: system

tasks:
  - id: commit_and_push
    type: io.kestra.plugin.git.PushFlows
    sourceNamespace: dev # the namespace from which flows are pushed
    targetNamespace: prod # the target production namespace; if different than sourceNamespace, the sourceNamespace in the source code will be overwritten by the targetNamespace
    flows: "*"  # optional list of Regex strings; by default, all flows are pushed
    includeChildNamespaces: true # optional boolean, false by default
    gitDirectory: _flows 
    url: https://github.com/kestra-io/scripts # required string
    username: git_username # required string needed for Auth with Git
    password: "{{ secret('GITHUB_ACCESS_TOKEN') }}" # optional, required for private repositories
    branch: kestra # optional, uses "kestra" by default
    commitMessage: "add flows {{ now() }}" # optional string
    dryRun: true  # if true, you'll see what files will be added, modified or deleted based on the Git version without overwriting the files yet

triggers:
  - id: schedule_push
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "0 17 * * *" # release/push to Git every day at 5pm

Manually push a single flow to Git if the input push is set to true.

id: myflow
namespace: prod

inputs:
  - id: push
    type: BOOLEAN
    defaults: false

tasks:
  - id: if
    type: io.kestra.core.tasks.flows.If
    condition: "{{ inputs.push == true}}"
    then:
      - id: commit_and_push
        type: io.kestra.plugin.git.PushFlows
        sourceNamespace: prod # optional; if you prefer templating, you can use "{{ flow.namespace }}"
        targetNamespace: prod # optional; by default, set to the same namespace as defined in sourceNamespace
        flows: myflow # if you prefer templating, you can use "{{ flow.id }}"
        url: https://github.com/kestra-io/scripts 
        username: git_username 
        password: "{{ secret('GITHUB_ACCESS_TOKEN') }}"
        branch: kestra 
        commitMessage: "add flow {{ flow.namespace ~ '.' ~ flow.id }}" 

Properties

url

The Git repository URI that kestra will clone and push flows to. Repository URI is the only required property (apart from authentication-specific properties).

branch

The branch to which flows should be committed and pushed. If the branch doesn’t exist yet, it will be created. If not set, the task will push the flows to the kestra branch.

gitDirectory

Directory to which flows should be pushed. If not set, flows will be pushed to a Git directory named _flows and will optionally also include subdirectories named after the child namespaces. If you prefer, you can specify an arbitrary path, e.g., kestra/flows, allowing you to push flows to that specific Git directory. If the includeChildNamespaces property is set to true, this task will also push all flows from child namespaces into their corresponding nested directories, e.g., flows from the child namespace called prod.marketing will be added to the marketing folder within the _flows folder. Note that the targetNamespace (here prod) is specified in the flow code; therefore, kestra will not create the prod directory within _flows. You can use the PushFlows task to push flows from the sourceNamespace, and use SyncFlows to then sync PR-approved flows to the targetNamespace, including all child namespaces.

sourceNamespace

The source namespace from which flows should be synced to the gitDirectory.

targetNamespace

The target namespace, intended as the production namespace; if set, the sourceNamespace will be overwritten to the targetNamespace in the flow source code to prepare your branch for merging into the production namespace.

flows

A list of Regex strings that declare which flows should be included in the Git commit. By default, all flows from the specified sourceNamespace will be pushed (and optionally adjusted to match the targetNamespace before pushing to Git). If you want to push only the current flow, you can use the "{{flow.id}}" expression or specify the flow ID explicitly, e.g. myflow. Given that this is a list of Regex strings, you can include as many flows as you wish, provided that the user is authorized to access that namespace.

includeChildNamespaces

Whether you want to push flows from child namespaces as well. By default, it’s false, so the task will push only flows from the explicitly declared namespace without pushing flows from child namespaces. If set to true, flows from child namespaces will be pushed to child directories in Git. See the example below for a practical explanation:

Source namespace in the flow code Git directory path Synced to target namespace
namespace: dev _flows/flow1.yml namespace: prod
namespace: dev _flows/flow2.yml namespace: prod
namespace: dev.marketing _flows/marketing/flow3.yml namespace: prod.marketing
namespace: dev.marketing _flows/marketing/flow4.yml namespace: prod.marketing
namespace: dev.marketing.crm _flows/marketing/crm/flow5.yml namespace: prod.marketing.crm
namespace: dev.marketing.crm _flows/marketing/crm/flow6.yml namespace: prod.marketing.crm

dryRun

If true, the task will only display modifications without syncing any flows yet. If false (default), all listed flows will be pushed to Git immediately.

commitMessage

Git commit message. By default, set to "Add flows from sourceNamespace namespace", e.g. "Add flows from dev namespace".

username

The username or organization.

authorEmail

The commit author email; if null, no author will be set on this commit

authorName

The commit author name; if null, the username will be used instead

passphrase

The passphrase for the privateKey.

password

The password or personal access token.

privateKey

PEM-format private key content that is paired with a public key registered in Git. To generate an ECDSA PEM format key from OpenSSH, use the following command: ssh-keygen -t ecdsa -b 256 -m PEM. You can then set this property with your private key content and put your public key on Git.

Outputs

commitId

ID of the commit pushed.

commitURL

URL to see what has changed. Example format:

  • for GitHub: https://github.com/{owner}/{repository}/commit/{commitId}
  • for Gitea: https://{Gitea_host}/{owner}/{repository}/commit/{commitId}
  • for Bitbucket: https://bitbucket.org/{owner}/{repository}/commits/{commitId}
  • for GitLab: https://gitlab.com/{owner}/{repository}/-/commit/{commitId}
  • for Azure Repos: https://dev.azure.com/{organization}/{project}/_git/{repository}/commit/{commitId}
  • for AWS CodeCommit: https://console.aws.amazon.com/codesuite/codecommit/repositories/{repository}/commit/{commitId}

SyncFlows task

type: "io.kestra.plugin.git.SyncFlows"

Sync flows from Git to kestra.

This task syncs flows from a given Git branch to one or more kestra namespaces. If the delete property is set to true, any flow available in kestra but not present in the gitDirectory will be deleted, allowing you to maintain Git as a single source of truth for your flows. Check the Version Control with Git documentation for more details.

Examples

Sync flows from a Git repository. This flow can run either on a schedule (using the Schedule trigger) or anytime you merge some changes to a given Git branch (using the Webhook trigger).

id: sync_flows_from_git
namespace: system

tasks:
  - id: git
    type: io.kestra.plugin.git.SyncFlows
    gitDirectory: _flows # optional; set to _flows by default
    targetNamespace: prod
    includeChildNamespaces: true # optional; by default, it's set to false to allow explicit definition
    delete: true # optional; by default, it's set to false to avoid destructive behavior
    url: https://github.com/kestra-io/flows
    branch: main
    username: git_username
    password: "{{ secret('GITHUB_ACCESS_TOKEN') }}"
    dryRun: true  # if true, the task will only log which flows from Git will be added/modified or deleted in kestra without making any changes in kestra backend yet

triggers:
  - id: every_minute
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "*/1 * * * *"

Properties

url

The Git repository URI from which Kestra will clone and sync flows.

branch

The branch from which flows will be synced to kestra.

cache

Whether you want to cache the cloned repository in kestra’s internal storage. When syncing flows often, this property might improve performance. If set to true, kestra will clone the given repository once, cache it in internal storage, and only pull changes in the subsequent task runs. Changing this property back to false will ignore (and effectively invalidate) the cache.

targetNamespace

The target namespace to which flows from the gitDirectory should be synced. If the top-level namespace specified in the flow source code is different than the targetNamespace, it will be overwritten by this target namespace. This facilitates moving between environments and projects. If includeChildNamespaces property is set to true, the top-level namespace in the source code will also be overwritten by the targetNamespace. For example, if targetNamespace is set to prod and includeChildNamespaces property is set to true, then namespace: dev in flow source code will be overwritten by namespace: prod, and namespace: dev.marketing.crm will be overwritten by prod.marketing.crm. See the table below for a practical explanation:

Source namespace in the flow code Git directory path Synced to target namespace
namespace: dev _flows/flow1.yml namespace: prod
namespace: dev _flows/flow2.yml namespace: prod
namespace: dev.marketing _flows/marketing/flow3.yml namespace: prod.marketing
namespace: dev.marketing _flows/marketing/flow4.yml namespace: prod.marketing
namespace: dev.marketing.crm _flows/marketing/crm/flow5.yml namespace: prod.marketing.crm
namespace: dev.marketing.crm _flows/marketing/crm/flow6.yml namespace: prod.marketing.crm

gitDirectory

Directory from which flows should be synced. If not set, this task assumes your branch has a Git directory named _flows (equivalent to the default gitDirectory of the PushFlows task). If includeChildNamespaces property is set to true, this task will push all flows from nested subdirectories into their corresponding child namespaces, e.g. if targetNamespace is set to prod, then flows from the _flows directory will be synced to the prod namespace, flows from the marketing subdirectory in Git will be synced to the prod.marketing namespace, and flows from marketing/crm subdirectory will be synced to the prod.marketing.crm namespace.

includeChildNamespaces

Whether you want to sync flows from child namespaces as well. It’s false by default so that the task will sync only flows from the explicitly declared gitDirectory without traversing child directories. If set to true, flows from child namespaces (created in Git as subdirectories) will be synced to child directories in Git named according to the child namespace.

delete

Whether you want to delete flows present in kestra but not present in Git. It’s false by default to avoid destructive behavior. Use this property with caution because when set to true and includeChildNamespaces is also set to true, this task will delete all flows from the targetNamespace and all its child namespaces that are not present in Git.

cloneSubmodules

Whether to clone submodules.

dryRun

If true, the task will only display modifications without syncing any flows yet. If false (default), all flows will be synced from Git directly.

username

The username or organization.

password

The password or personal access token.

passphrase

The passphrase for the privateKey.

privateKey

PEM-format private key content that is paired with a public key registered on Git. To generate an ECDSA PEM format key from OpenSSH, use the following command: ssh-keygen -t ecdsa -b 256 -m PEM. You can then set this property with your private key content and put your public key on Git.

Outputs

flows

A map of outputs with the following keys: deletions, additions and changes. For example, to access additions, use {{ outputs.git.flows.additions}}. In the example below, this will output ["_flows/flow2.yml"].

{
  "flows": {
    "deletions": [
      "_flows/flow1.yml"
    ],
    "additions": [
      "_flows/flow2.yml"
    ],
    "changes": [
      "_flows/flow3.yml"
    ]
  }
}