kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.16k stars 409 forks source link

sparse checkout not working #776

Closed harshitasaxena05 closed 3 months ago

harshitasaxena05 commented 12 months ago

Hi, I'm trying to use git-sync with sparse checkout feature

I'm having https://dummy.git repo containing /output dir having files - a.txt, b.txt.

I want to download a.txt only.

Below are the env and volume mount I have set -

       env:
         - name: GIT_SYNC_REPO
           value:  https://dummy.git
         - name: GIT_SYNC_DEST
           value: git-sync
         - name: GIT_SYNC_USERNAME
           value: user
         - name: GIT_SYNC_PASSWORD
           value: pwd
         - name: GITSYNC_SPARSE_CHECKOUT_FILE
           value: "output/a.txt"

      volumeMounts:
        - name: content-from-git
          mountPath: /tmp/git

image version: v3.6.6

Below is the error output -

"msg"="too many failures, aborting" "error"="open /output/a.txt: no such file or directory" "failCount"=1

If I don't set this env GITSYNC_SPARSE_CHECKOUT_FILE, it's working by cloning entire repo inside /tmp/git/git-sync/    Can anyone help in resolving this issue? Thanks.

thockin commented 12 months ago

The sparse checkout file is an input file, in git's specific syntax (https://git-scm.com/docs/git-sparse-checkout). For example in k8s that might be from a configmap.

I didn't have a lot of users for sparse-checkout, so we left then UX there. I am open to a nicer UX if we have some real use-case.

Also note that the GITSYNC_ variables (as opposed to GITSYNC) are only in v4, which is still not GA yet.

thockin commented 12 months ago

Following up - the current --sparse-checkout-file is calling git sparse-checkout init. I might consider something like a new --sparse-checkout <value> (repeated) flag which calls git sparse-checkout add. The docs for git sparse-checkout claim that this is experimental:

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.

So I am a little worried about adding more API on top of it. Then there's all the secondary options (sparse index, cone mode, etc) that need to be considered.

So I'd want a bit more understanding of the need and some REALLY good e2e tests.

shilpa87-khushi commented 12 months ago

Hello,

We have the below usecase

Files will be stored in git as repository which will be used to pull inside container which uses git-sync git repository might be huge and we want to download only specific file or folders using sparse checkout feature . Hence we were not sure what exactly should be value for this env variable since there were no example and only description available .Could you please provide input from above example as to what all should be set in 3.x version so as to download only folders and not entire repo inside container using git sync

thockin commented 12 months ago

I don't know if sparse checkout prevents downloading - I think it is only about checkout (which files are present in the worktree, vs in the "hidden" database.

This flag expects a filename which you present (e.g. in a volume mount) which is filled with git's sparse-checkout syntax. Try 'git help sparse-checkout'.

You might have better success with --depth=1 though

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kennes913 commented 4 months ago

@thockin, I am using git-sync v4. How exactly do you get this working? Instead of passing in the file, I am writing this locally during a modified entrypoint script.

Update: Nevermind. I figured this out. For anyone who stops here and sees this, you have to simply add the git sparse checkout patterns to the input file, write or mount the file to the container and provide the location relative to --root or the absolute path.