kestra-io / plugin-git

Apache License 2.0
3 stars 4 forks source link

`EachParallel` reports successful parallel commits but commits not visible in Git Repository #81

Closed wrussell1999 closed 1 month ago

wrussell1999 commented 5 months ago

Expected Behavior

When using EachParallel to bulk commit multiple namespaces at once, all commits should appear in GitHub. If this is not possible due to the nature of committing multiple times to the same repository, it should fail or mentioned that it's not possible.

Looks like Git PushFlows is rewriting the history, rather than pushing a new commit on top. If that's the intention, we should clear that up in the documentation as that prevents parallel use of PushFlows. This commit was accessed from the Kestra logs: Screenshot 2024-06-20 at 17 18 26

Actual Behaviour

When using EachParallel to bulk commit multiple namespaces at once, it comes back as successful but not all of the commits are visible. Screenshot 2024-06-20 at 17 17 21

Only one of the commits was successful: Screenshot 2024-06-20 at 17 17 28

Steps To Reproduce

id: push_to_git
namespace: system

tasks:
  - id: fetch_namespaces
    type: io.kestra.plugin.core.http.Request
    uri: http://host.docker.internal:8082/api/v1/flows/distinct-namespaces

  - id: loop_namespaces
    type: io.kestra.plugin.core.flow.EachParallel
    value: "{{outputs.fetch_namespaces.body}}"
    tasks:
      - id: commit_and_push
        type: io.kestra.plugin.git.PushFlows
        username: wrussell1999
        password: "{{ secret('GITHUB_ACCESS_TOKEN') }}"
        url: https://github.com/wrussell1999/test
        branch: main
        sourceNamespace: "{{ taskrun.value }}"
        targetNamespace: "{{ taskrun.value }}"
        gitDirectory: "flows/{{ taskrun.value }}"
        commitMessage: "parallel: changes to kestra flows for {{ taskrun.value }}"

Environment Information

Example flow

No response

madisonb commented 4 months ago

This appears to also impact io.kestra.plugin.git.Push as well. Running a flow multiple times at the same time (modifying different files simultaneously) will only generate a single commit.

A simple retry loop with a git pull; git push solves this in other CICD style jobs on other platforms.

mgabelle commented 2 months ago

After investigating the issue I have noticed that there is a task race between all the PushFlows. When the tasks begin they each have the same commit ref for the main branch. Therefore the first flow to finish will be able to push and update the main branch to the new pushed commit but the others tasks won't be able to do that. The commits are pushed indeed but not to the main branch as we pull the branch only once before pushing.

A possible solutions would be to enhance the pushflows method to accept a list of namespace flows and then create a chain of commits and push them all at once. @anna-geller

At the moment a workaround would be to use sequential flow like this one but of course the flow duration increases :

id: push_to_git_sequential
namespace: test.git

tasks:
  - id: fetch_namespaces
    type: io.kestra.plugin.core.http.Request
    uri: http://localhost:8080/api/v1/flows/distinct-namespaces

  - id: loop_namespaces
    type: io.kestra.plugin.core.flow.EachSequential
    value: "{{outputs.fetch_namespaces.body}}"
    tasks:
      - id: commit_and_push
        type: io.kestra.plugin.git.PushFlows
        username: mgabelle
        password: "{{ secret('GITHUB_ACCESS_TOKEN') }}"
        url: https://github.com/mgabelle/test-git-push-flow
        branch: testSequential
        sourceNamespace: "{{ taskrun.value }}"
        targetNamespace: "{{ taskrun.value }}"
        gitDirectory: "flows/{{ taskrun.value }}"
        commitMessage: "sequential: changes to kestra flows for {{ taskrun.value }}"
Ben8t commented 2 months ago

After discussing with Mat, sounds like the "responsibility" of Git: even outside of Kestra, Git cannot guarantee the order in which the pushes will be received by the remote repository. If two users push changes at the same time, Git might reject one the push or show the warning shared by Will in issue description above.

I'm willing to close this issue to open another one to add a reference in the doc of the task to mention that it's not recommended to run such pattern due to the potential for conflicts and data loss. Preferring Sequential workflows or pushing to separated branches.

WDTY @anna-geller ?

anna-geller commented 2 months ago

That is understandable, sounds like a solid plan! Thanks for handling it Ben!

Ben8t commented 1 month ago

close in favor of #94