kestra-io / plugin-git

Apache License 2.0
3 stars 4 forks source link

New Git tasks to better support moving between environments using Git Push and Sync for Namespace Files #57

Closed anna-geller closed 6 months ago

anna-geller commented 8 months ago

Feature description

Goals of new tasks

  1. Separate Flows vs. Namespace Files, as coupling them into single Push and Sync tasks has caused confusion and unnecessary complexity, especially for users who want to integrate Git only for flows or only for Namespace Files.
  2. Allow pushing only specific Namespace File(s) to a specific Git directory.
  3. Allow the system flow pattern will work thanks to https://github.com/kestra-io/kestra-ee/issues/1099.

PushNamespaceFiles task

type: "io.kestra.plugin.git.PushNamespaceFiles"

Commit and push Namespace Files created from kestra UI to Git.

Using this task, you can push one or more Namespace Files from a given kestra namespace to Git. Check the Version Control with Git documentation for more details.

Examples

Push all saved Namespace Files from the dev namespace to a Git repository every 15 minutes.

id: push_to_git
namespace: system

tasks:
  - id: commit_and_push
    type: io.kestra.plugin.git.PushNamespaceFiles
    namespace: dev
    files: "*"  # optional list of Regex strings; by default, all files are pushed
    gitDirectory: _files # optional path in Git where Namespace Files should be pushed
    url: https://github.com/kestra-io/scripts # required string
    username: git_username # required string needed for Auth with Git
    password: "{{ secret('GITHUB_ACCESS_TOKEN') }}" # optional, required for private repositories
    branch: dev # optional, uses "kestra" by default
    commitMessage: "add namespace files" # optional string
    dryRun: true  # if true, you'll see what files will be added, modified or deleted based on the Git version without overwriting the files yet

triggers:
  - id: schedule_push_to_git
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "*/15 * * * *"

Manually push a single file specified in the input to Git.

id: myflow
namespace: system

inputs:
  - id: file_to_commit
    type: STRING

tasks:
  - id: commit_and_push
    type: io.kestra.plugin.git.PushNamespaceFiles
    namespace: dev 
    files: "{{ inputs.file_to_commit }}"
    url: https://github.com/kestra-io/scripts 
    username: git_username 
    password: "{{ secret('GITHUB_ACCESS_TOKEN') }}"
    branch: kestra 
    commitMessage: "add {{ inputs.file_to_commit }}" 

Properties

url

The Git repository URI that kestra will clone and push Namespace Files to. Repository URI is the only required property (apart from authentication-specific properties).

branch

The branch to which Namespace Files should be committed and pushed. If the branch doesn’t exist yet, it will be created. If not set, the task will push the files to the kestra branch.

namespace

The namespace from which files should be pushed to the gitDirectory.

gitDirectory

Directory to which Namespace Files should be pushed. If not set, files will be pushed to a Git directory named _files. See the table below for an example mapping of Namespace Files to Git paths:

Namespace File Path Git Directory Path
scripts/app.py _files/scripts/app.py
scripts/etl.py _files/scripts/etl.py
queries/orders.sql _files/queries/orders.sql
queries/customers.sql _files/queries/customers.sql
requirements.txt _files/requirements.txt

files

Which Namespace Files should be included in the commit. By default, kestra will push all Namespace Files from the specified namespace. If you want to push only a specific file or directory e.g. myfile.py, you can set it explicitly using files: myfile.py. Given that this is a Regex string (or a list of Regex strings), you can include as many files as you wish, provided that the user is authorized to access that namespace.

dryRun

If true, the task will only display modifications without syncing any Namespace Files yet. If false (default), all listed Namespace Files will be pushed to Git immediately.

commitMessage

Git commit message. By default, set to "Add files from XYZ namespace", where XYZ is the value of namespace property.

username

The username or organization.

authorEmail

The commit author email; if null, no author will be set on this commit

authorName

The commit author name; if null, the username will be used instead

passphrase

The passphrase for the privateKey.

password

The password or personal access token.

privateKey

PEM-format private key content that is paired with a public key registered in Git. To generate an ECDSA PEM format key from OpenSSH, use the following command: ssh-keygen -t ecdsa -b 256 -m PEM. You can then set this property with your private key content and put your public key on Git.

Outputs

commitId

ID of the commit pushed.

compareURL

URL to see what’s changed or to start a new pull request. Example format for GitHub: https://github.com/username/your_repo/compare/main…kestra.

files

A map of outputs with the following keys: deletions, additions and changes. For example, to access additions, use {{ outputs.git.files.additions}}. In the example below, this will output ["_files/myscript.py"].

{
  "files": {
    "deletions": [
      "_files/queries/myquery.sql"
    ],
    "additions": [
      "_files/myscript.py"
    ],
    "changes": [
      "_files/script/etl.py"
    ]
  }
}

SyncNamespaceFiles task

type: "io.kestra.plugin.git.SyncNamespaceFiles"

Sync Namespace Files from Git to kestra.

This task syncs Namespace Files from a given Git branch to a kestra namespace. If the delete property is set to true, any Namespace Files available in kestra but not present in the gitDirectory will be deleted, allowing to maintain Git as a single source of truth for your Namespace Files. Check the Version Control with Git documentation for more details.

Examples

Sync Namespace Files from a Git repository. This flow can run either on a schedule (using the Schedule trigger) or anytime you push a change to a given Git branch (using the Webhook trigger).

id: sync_flows_from_git
namespace: system

tasks:
  - id: git
    type: io.kestra.plugin.git.SyncNamespaceFiles
    namespace: prod
    gitDirectory: _files # optional; set to _files by default
    delete: true # optional; by default, it's set to false to avoid destructive behavior
    url: https://github.com/kestra-io/flows
    branch: main
    username: git_username
    password: "{{ secret('GITHUB_ACCESS_TOKEN') }}"
    dryRun: true  # if true, the task will only log which flows from Git will be added/modified or deleted in kestra without making any changes in kestra backend yet

triggers:
  - id: every_minute
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "*/1 * * * *"

Properties

branch

The branch from which Namespace Files will be synced to kestra.

url

The Git repository URI that kestra will clone and sync Namespace Files from.

cache

Whether you want to cache the cloned repository in kestra’s internal storage. When syncing Namespace Files often, this property might improve performance. If set to true, kestra will clone the given repository once, cache it in internal storage, and then only pull changes in the subsequent task runs. Changing this property back to false will ignore (and effectively invalidate) the cache.

namespace

The namespace from which files should be synced from the gitDirectory to kestra.

gitDirectory

Directory from which Namespace Files should be synced. If not set, this task assumes your branch includes a directory named _files.

delete

Whether you want to delete Namespace Files present in kestra but not present in Git. It’s false by default to avoid destructive behavior. Use with caution because when set to true, this task will delete all Namespace Files which are not present in Git.

cloneSubmodules

Whether to clone submodules.

dryRun

If true, the task will only display modifications without syncing any files yet. If false (default), all Namespace Files will be synced from Git directly.

username

The username or organization.

password

The password or personal access token.

passphrase

The passphrase for the privateKey.

privateKey

PEM-format private key content that is paired with a public key registered on Git. To generate an ECDSA PEM format key from OpenSSH, use the following command: ssh-keygen -t ecdsa -b 256 -m PEM. You can then set this property with your private key content and put your public key on Git.

Outputs

files

A map of outputs with the following keys: deletions, additions and changes. For example, to access additions, use {{ outputs.git.files.additions}}. In the example below, this will output ["_ffiles/myscript.py"].

{
  "files": {
    "deletions": [
      "_files/queries/myquery.sql"
    ],
    "additions": [
      "_files/myscript.py"
    ],
    "changes": [
      "_files/script/etl.py"
    ]
  }
}
anna-geller commented 8 months ago

alternative output: ION file with one row per file:

{
  "file": "scripts/main.py",
  "additions": +3,
  "deletions": -3,
  "changes": 0
}