kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.22k stars 411 forks source link

Bug: git-sync v4 implementation stuck in loop after unexpected worktree removal #827

Closed bakome closed 11 months ago

bakome commented 11 months ago

Description

The issue appear when sudden deletion or lost of .worktree/{hash} directory happen. After that the next sync is creating the worktree again, but the code is deleting the files immediately. This loop go to infinity and only blinking version of the sync repo is present for less than a second.

Context

In the nature of some applications where git-sync is very crucial this can lead to very non predictive behavior, especially in distributed systems.

I was able to find this error when implementing git-sync actually in Airflow applications, which lead to very bad errors in that system because of this inconsistency.

Additional Details

The issue first appear with usage of NFS mount and sudden restart of the NFS server, which lead to temp lost of synced directories.

Here is a docker-compose environment that can help to replicate the issue:


services:
  nfs:
    image: itsthenetwork/nfs-server-alpine:latest
    restart: "no"
    privileged: true  
    container_name: git-sync-nfs          
    environment:
      SHARED_DIRECTORY: /exports
    volumes:
      - nfs-server:/exports

  git-sync:
    image: registry.k8s.io/git-sync/git-sync:v4.0.0
    privileged: true
    entrypoint: /bin/sh
    container_name: git-sync-run
    user: "0:0"
    command:
      - -c
      - |       
        chmod -R 777 /git
        mount -v -t nfs -o rw,vers=4 nfs:/ /git
        mkdir -p /git/root        
        /git-sync --verbose 9 --repo=https://github.com/kubernetes/git-sync --root=/git/root --period=5s
    restart: "no"  
    depends_on:
      - nfs
    volumes:   
      - nfs-client:/git         

volumes:
  nfs-server:
  nfs-client:  

After the initial creation of container please temporary stop nfs container. docker compose stop git-sync-nfs

The git-sync container should fail after some time, sometimes commands get stuck because folders went missing. After the fail pls restore the nfs service. docker compose start git-sync-nfs

And after the restore the infinite add and immediately remove loop is started. I was not able why the first removal is performed and I assume is because of git worktree nature and fsck check, but however git-sync should auto repair from this behavior.

This error is not present in versions less than v4.0.0, version v3.6.9 is checked and is working good.

bakome commented 11 months ago

In Git Worktree documentation there is a note about using NFS or other transferable mounts:

If the working tree for a linked worktree is stored on a portable device or network share which is not always mounted, you can prevent its administrative files from being pruned by issuing the git worktree lock command, optionally specifying --reason to explain why the worktree is locked.

But I think this is can be addressed as separate issue, because I can replicate the problem without external mount. However I believe is good to have some flags to enable lock on worktrees and implementations that are using non standard mounts.

thockin commented 11 months ago

In this case I think it would not matter because both the worktree and the main repo are on the same volume.

bakome commented 11 months ago

It depends of how the sync is used, still can someone have some process or other cleanup with different kind of storage that can cause this behavior, but I agree this can be very rare. Still I think the proposed fix can prevent this situation and do no harm.