ingydotnet / git-subrepo

MIT License
3.24k stars 268 forks source link

git-lfs support? #152

Open perrette opened 8 years ago

perrette commented 8 years ago

I was wondering, does git-subrepo supports dependencies tracked with git-lfs?

If not, any suggestion for workarounds? Here some hints: https://github.com/github/git-lfs/issues/854

I mean, git-lfs is a pretty handy tool, so any support would be much appreciated!

The solution I have in mind right now, is to create a specific branch in my library (which I am also developing, by chance), where any git-lfs dependency is simply removed. And then add these files manually in my main project...

Any other idea? Thanks for the great tool !

grimmySwe commented 8 years ago

Have you done any testing? git-subrepo doesn't depend on the content and uses git commands to perform the internal operations so I don't see any obvious obstacles, but...

ingydotnet commented 8 years ago

@perrette, I'm guessing that not many people here (myself included) have experience with git-lfs. My advice would be to try using it with git-subrepo and then report any problems as issues, with specific commands to reproduce the issues.

animetrics commented 7 years ago

I've been using subrepo with lfs and there is a potential issue. E.g. my parent repo is set via .gitattributes to put all .png files in lfs. I have a subrepo that has .png files checked into git (not via lfs). When I do a git status it shows the .png files from the subrepo as "modified" presumably b/c it wants to check them in under git lfs. I had to exclude the subrepo directory so it won't try to do this. Presumably this could be a feature? I.e. if I wanted to track subrepo binary files via lfs even if the subrepo doesn't. But this seems problematic for a git subrepo push

lp35 commented 4 years ago

Hi,

It would be great to have subtree working with git-lfs. Any hope to see that implemented?

Thank you for this wonderful project.

admorgan commented 4 years ago

This seems like something that can be addressed. Like @ingydotnet I don't have any experience using git-lfs, but If @lp35, @animetrics, and @perrette can help me work through the use cases I am pretty sure we can work something out.

The situation as I understand it is that when using git-lfs certain file types are fetched and pushed to another storage solution and a marker inserted into git that describes that transaction.

Thought flow of different scenarios:

  1. Add a git subrepo to a project that uses git-lfs and there are files that meet the filter criteria. a. Add entries to .gitattributes to disable git-lfs in this directory b. Allow git-lfs to do it's job, but hide it from upstream c. Allow git-lfs to do it's job and push it back to upstream
  2. Files moved from git subrepo directory to parent project a. How to identify that the file should be git-lfs or standard file? b. Permissions? c. Subrepo uses git-lfs but parent doesn't
    • Is this even possible?
  3. Files moved from parent repo to subrepo directory a. Convert to normal file unless upstream is using git-lfs? b. Permissions?
  4. git lfs migrate used on a branch utilizing subrepo a. Can we inject into their filter stream?

I obviously missed a bunch of stuff, and I don't have any answers to any of it. This is just to get a conversation going so I can understand the problem better.

perrette commented 4 years ago

Not sure how git-subrepo works (I am not currently using it), but as a potential user I would expect the git-lfs commands should be executed separately for each sub-repos, following rules from each .gitattributes file (and considering it a normal git file if no .gitattribute is present).

Not sure how that can be implemented (i lack detailled knowledge to comment on 1), but for the rest that means 2.a : the parent project 's .gitattributes should be the key. If none is present, it becomes a normal file. 2.b. (permissions) not sure what you are referring to 2.c. That is possible. See 2.a. Each subrepo has a .gitattributes (or not) and rules (or lack thereof) should follow from it. 3.a. subrepo's .gitattributes rules

  1. never used so far, but I know where it could be useful, so that would be a good feature to have !
admorgan commented 4 years ago

2.a So a proposal that subrepo creates a .gitattributes to prevent the parent's .gitattribute from executing lfs in the subrepo directory. 2.b Reading the documentation it references that the files in lfs use the same permissions model as the git repo. I was stating that I don't know how that works, especially in the context of senario 2. 3.a I would like to know if this would already happens with lfs.

perrette commented 4 years ago

I just had a go at git-subrepo (and also tried to git subrepo pull a repo that contains git-lfs tracked files, which failed because lfs-tracked files are not present on the main remote -- not sure how @animetrics did it). Admittedly this is not the most common case (one would rather expect the main repo to have git-lfs files and the subrepos not, as libraries are generally as light as possible), but this illustrate one first question: should the git-lfs file be duplicated on both servers, as other files are, or should they be fetched where they belong in the first place? I would argue for the second option: only duplicate the reference, and fetch from the lfs-file from original location.

If the file is moved from parent to subrepo or the other way around, apply .gitattributes rules to add as git or git-lfs. You might have been there already, I am catching up.

2.a. Now for the issue of isolating the subrepo's and parent's .gitattributes I am not sure how that works, but a first attempt tells me that a sub-folder .gitattributes does NOT overwrite, the parent, though it might add rules to it (test case: the main .gitattributes track *.bin files, a local folder with empty .gitattributes and a test.bin file >> the local test.bin is added via git lfs regardless).

3.a. For me git-lfs in subrepo fails, so I cannot comment on combined subrepo/lfs use.

I'd say a challenge to make things work would be to fetch the lfs files on the respective servers...

Though thinking about it I wonder whether having full compatibility is worth the effort. To me the most important use case works nearly out of the box: main repo uses git-lfs, subrepo do not, and do not conflict with the main. First step would be to avoid possible conflicts (main git lfs messing with lib) without rewriting the subrepo.

lp35 commented 4 years ago

I just had to a go to git-subrepo (and also tried to git subrepo pull a repo that contains git-lfs tracked files, which failed because lfs-tracked files are not present on the main remote -- not sure how @animetrics did it). Admittedly this is not the most common case (one would rather expect the main repo to have git-lfs files and the subrepos not, as libraries are generally as light as possible), but this illustrate one first question: should the git-lfs file be duplicated on both servers, as other files are, or should they be fetched where they belong in the first place? I would argue for the second option: only duplicate the reference, and fetch from the lfs-file from original location.

Depending on the implementation of server-side lfs. On gitlab for example, files are pushed only once. When LFS server receive a new file, it checks if it already has the file stored by checking the hash.

Concerning the use case, not very common in open source but common in companies, where compiled libraries can be passed from a team to another team to simplify compilation pipeline and dependencies hell. The use case is really present, and most people are using submodule to handle this case for now.

perrette commented 4 years ago

Depending on the implementation of server-side lfs. On gitlab for example, files are pushed only once. When LFS server receive a new file, it checks if it already has the file stored by checking the hash.

I meant, after git subrepo clone remote/lib is executed, the subsequent git lfs pull or git lfs checkout must be directed to remote/lib, instead of remote/main. Not doing so is probably the first reason why including git-lfs tracked files in subrepo fails.

Well, in fact what I thought was an issue (push/pull to various repos) seems very straightfoward, as it seems that git lfs pull/pull REMOTE path/to/file/pointer works fine for an arbitrary REMOTE (not tested in every situation), so it is enough to execute everything normally with GIT_LFS_SKIP_SMUDGE=1 and fetch the files in a second step. For instance, that works with the current version:

GIT_LFS_SKIP_SMUDGE=1 git subrepo clone REMOTE/lib
git lfs pull REMOTE/lib lib

As far as 3.a. I tested the situation where both main and library have a .gitattributes to track '*.bin' under git-lfs. To my suprise, commiting a file in the parent, then moving it to lib, then executing git subrepo push on the lib works as expected: the file is added as git-lfs to the subrepo. I checked this also applies when the subrepo does not have the corresponding filter for that sort of file. Even though that is not the behavior I had initially advocated for, I think this is perfectly acceptable as well.

In conclusion, the basic pipeline works fine as it is, and only minor modifications might be needed.

lp35 commented 4 years ago

Hi,

Any update on a potential patch?

Thanks!

sonatique commented 3 years ago

Hi, I am also veeery interested by a patch.

subrepo is really an excellent tool, thanks a lot for it to all contributors. As of now its lack of LFS support prevent me to use it to its full potential, unfortunately.

Thanks in advance!

sonatique commented 3 years ago

Hi, Actually I just made some tests again, using subrepo release 4.02. I have 3 repos, say A, B and S. I enabled LFS on all of them (bitbucket). Now it seems I can do: 1) from local A: git subrepo push MySubrepoFolder 2) from local B: git subrepo pull MySubrepoFolder at this point LFS files are still in the "pointer form" on local B. 3) from local B: git lfs pull B-Url Now LFS pointer are correctly converted into binary files!

So it seems that basically it would be enough to add line #3 at the end of the command git subrepo pull, or something similar...

sonatique commented 3 years ago

Just in case: https://github.com/ingydotnet/git-subrepo/pull/505

contentfree commented 5 months ago

@perrette's hint of skipping smudge and later pulling is the only way that I'm able to use git-subrepo successfully. Is there any chance this can be incorporated into git-subrepo?

Pleune commented 4 months ago

Depending on the implementation of server-side lfs. On gitlab for example, files are pushed only once. When LFS server receive a new file, it checks if it already has the file stored by checking the hash.

I meant, after git subrepo clone remote/lib is executed, the subsequent git lfs pull or git lfs checkout must be directed to remote/lib, instead of remote/main. Not doing so is probably the first reason why including git-lfs tracked files in subrepo fails.

Well, in fact what I thought was an issue (push/pull to various repos) seems very straightfoward, as it seems that git lfs pull/pull REMOTE path/to/file/pointer works fine for an arbitrary REMOTE (not tested in every situation), so it is enough to execute everything normally with GIT_LFS_SKIP_SMUDGE=1 and fetch the files in a second step. For instance, that works with the current version:

GIT_LFS_SKIP_SMUDGE=1 git subrepo clone REMOTE/lib
git lfs pull REMOTE/lib lib

As far as 3.a. I tested the situation where both main and library have a .gitattributes to track '*.bin' under git-lfs. To my suprise, commiting a file in the parent, then moving it to lib, then executing git subrepo push on the lib works as expected: the file is added as git-lfs to the subrepo. I checked this also applies when the subrepo does not have the corresponding filter for that sort of file. Even though that is not the behavior I had initially advocated for, I think this is perfectly acceptable as well.

In conclusion, the basic pipeline works fine as it is, and only minor modifications might be needed.

This works fine... it would be great if this workaround could be incorporated into subrepo. This stems from an upstream LFS bug that there seems to be no desire to fix from the LFS team https://github.com/git-lfs/git-lfs/issues/1948

admorgan commented 4 months ago

I will be honest, this hasn't received as much attention as it should because I know little about git lfs and I keep putting off how to figure out to add tests for these features. I will spend some time on the week of May 20th 2024 to rectify this lack of knowledge and add it to the test suite. After I have something that can be tested then we can make sure it works.

admorgan commented 4 months ago

Just wanted to let you know I am still working on the test infrastructure. Specifically git-subrepo is broken with any git version over 2.30. Fortunately it is only very edge cases and probably isn't effecting anyone actively, but the existing tests don't work when I pull up git, and I want to pull up git to work with the newest lfs. This likely won't be done until the middle of June due to scheduling conflicts though.

veylonni commented 2 months ago

I'm glad it's back on the agenda! Because that's the only reason my company can't use it extensively. We need Git LFS for most of our repositories because it allows us to keep all the test data with the source code.

admorgan commented 2 months ago

I am having to deal with git LFS more often now also. So this is high priority right after #602