Relative URL for git subrepo

rcdailey commented 7 years ago

So, like submodules, I expect this to work:

$ git subrepo clone ../core.git Core
git-subrepo: Command failed: 'git ls-remote ../core.git'.

However, as shown above, I get a failure. Some users use HTTPS, others use SSH. It depends on what origin is set to in the "parent" repo. I don't want to tie down the URL to one protocol or the other, requiring users to constantly change it and make sure they do not commit the .gitrepo file.

Is this supported and I'm just missing an important point?

grimmySwe commented 7 years ago

In our unittests we use relative urls for the remotes all the time. For protocols you setup a remote for the subrepo, so it has nothing to do with the parent repo as you may want to access different repos by different protocols.

For the failing example above I can't really see any problem. What happens if you git ls-remote ../core.git is there more information there?

rcdailey commented 7 years ago

So here is the repo I want to clone as a subrepo:

ssh://git@stash.tabletopmedia.com/fe/core.git

And here is the remotes registered with my current clone (the "parent" in this case):

$ git remote -v
fork    ssh://git@stash:7999/~robert/frontend.git (fetch)
fork    ssh://git@stash:7999/~robert/frontend.git (push)
origin  ssh://git@stash:7999/fe/frontend.git (fetch)
origin  ssh://git@stash:7999/fe/frontend.git (push)

Result of git ls-remote:

$ git ls-remote ../core.git
fatal: '../core.git' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

For contrast, here is the .gitmodules which currently works for core.git:

$ cat .gitmodules
[submodule "Core"]
    path = Core
    url = ../core.git
    branch = .

ingydotnet commented 7 years ago

What is ../core.git? It looks strange. Is it a 'bare' clone? Is it a directory? Can you show ls -l ../core.git.

I can do this simple test (using a random repo):

git clone git@github.com:ingydotnet/pegex-pm
git clone --bare git@github.com:ingydotnet/pegex-pm bare.git
cd pegex-pm
git subrepo clone ../bare.git Foo
git subrepo clone ./.git Bar

rcdailey commented 7 years ago

From the git-submodules documentation page:

<repository> is the URL of the new submodule’s origin repository. This may be either an absolute URL, or (if it begins with ./ or ../), the location relative to the superproject’s default remote repository (Please note that to specify a repository foo.git which is located right next to a superproject bar.git, you’ll have to use ../foo.git instead of ./foo.git - as one might expect when following the rules for relative URLs - because the evaluation of relative URLs in Git is identical to that of relative directories).

../core.git only makes sense when you look at the remotes I provided: origin is: ssh://git@stash:7999/fe/frontend.git, which means ../core.git should become ssh://git@stash:7999/fe/core.git. If you walk the URL like you do directories, per the example in the git documentation.

If you're storing the URL in metadata for the purposes of subrepo pull/push, I think it makes sense to store the relative URL as-is and calculate it against origin the same way git submodule works. This allows interoperability between protocols without modifying the .gitrepo file.

I'm still learning about subrepo, but so far this is my understanding. Thanks in advance for your quick feedback.

ingydotnet commented 7 years ago

Can you show ls -l ../core.git?

I think it should work fine, unless ../core.git is a working directory. If which case you'd want to:

git subrepo clone ../core.git/.git Core

rcdailey commented 7 years ago

So I did an example with an absolute URL:

And I got this in Core/.gitrepo:

[subrepo]
    remote = ssh://git@stash:7999/fe/core.git
    branch = master
    commit = 2eb8a1ed2b6f770cc3f065c89561ce8da9ccb1e6
    parent = da454b180fa9bf626440dd3f55fc1401ac3ddab7
    method = merge
    cmdver = 0.3.1

(Side Note: Weird that it says 0.3.1 when I have the release/0.4.0 branch checked out?)

So I guess I could use a real remote name with it, for example:

git remote add core ssh://git@stash:7999/fe/core.git

But concerns I have would be:

This seems to require that everyone have a core remote (And this requirement seems transitive; for example if core itself has nested subrepos as well). Extra manual steps.
How does this support the holy trinity development model (upstream + fork)? Multiple remotes are required for this methodology. Does this simply require that users use the --remote option explicitly when they use subrepo [pull|push]?

rcdailey commented 7 years ago

@ingydotnet Your command (the ls -l one) doesn't make sense because that's a relative url, not a relative path on my local filesystem. I'm not sure I understand what you're asking.

ingydotnet commented 7 years ago

I see. For subrepo (and other git commands like clone) that's a file system path. Like in my real world example above. Try git clone .git Foo inside any repo.

rcdailey commented 7 years ago

I'm sure that git submodule has special logic inside of it to process relative URLs, as most other git commands do not seem to function with them.

So is it expected that every developer add the required remotes before being able to pull/push the subrepo? Furthermore, is it even required that developers push/pull the subrepo frequently (multiple times a day, possibly one time per change they make)?

Also does push/pull in subrepo support fork repositories as well (temporarily pulling a subrepo to work on a temporary branch of that subrepo, then switching back to upstream when done)?

ingydotnet commented 7 years ago

Yeah, I've never seen this relative URL thing.

I'm not sure I follow completely. When you clone a subrepo, the URL is saved (as you know). The .gitrepo file is now a commited file in that subdirectory, as is all the subrepo content.

When another developer clones the main repo, they get all the content. And if they subrepo pull the content is fetched from the URL.

You can use --remote to pull/push using a different upstream.

I would just play around with it. Nothing is permanent until you push.

rcdailey commented 7 years ago

The benefits to using relative URLs are:

The URL for the subrepo is calculated from the origin remote, so the users do not need to add additional remotes specifically for the subrepos (works out of the box without manual steps from the user, if they plan to push/pull the subrepo itself)
Can be hierarchical. With nested subrepos, you can calculate them transitively (a .gitrepo in a directory under another one with a .gitrepo would calculate its relative URL from the one calculated above it)
Allow interoperability with different protocols and do not require the .gitrepo to change (You technically already solved this problem by using dedicated remotes for the subrepos, but this still stays true).

The logic would work similar to this:

User does git subrepo pull.
The subrepo scripts check the value of remote = in the .gitrepo file.
If the value starts with . or .., then it's a relative URL. Proceed to next step.
Walk up the directory tree until the top-most directory of the clone is reached (i.e. the one containing the .git directory (or file for submodules). If a .gitrepo file is found, calculate your absolute URL based on the calculated URL from that .gitrepo file (this is recursive up to the top-most dir). Otherwise, continue on.
Get the URL represented by a remote called origin. If no such remote exists, fail and notify the user. Otherwise, continue
Using the same rules documented by git submodule, calculate the absolute URL for the subrepo based on the URL provided by the origin remote. This can be done similar to walking a directory structure, except you do it with a URL as your "path". Example: If you have http://foo.git.repo/project/repo.git, then .. turns it into http://foo.git.repo/project, and ../another.git turns it into http://foo.git.repo/project/another.git. See the pattern?

This is non-trivial, obviously, but very useful. If there's one nice thing submodules did was introduce the concept of relative URLs. This is great when dealing with a lot of componentized repositories that are right next to each other (i.e. same base URL)

grimmySwe commented 7 years ago

It seems like a cool mechanism but I am not sure that it's suitable for subrepo.

Subrepos can be recursive but you should only handle the top nodes, otherwise you will create havoc in the internal .gitrepo files.

I think it might make more sense in the submodule world, but for me subrepo is not aimed to be connected that way.

Please try out subrepo and see if you get a sense on what it can/cannot to.

Note: version 0.3.1 is a leftover, we still haven't performed all updates in preparation for the release.

ingydotnet commented 7 years ago

@rcdailey when you say "so the users do not need to add additional remotes specifically for the subrepos" it makes me think that you don't understand subrepo. Users never need to add remotes, to push/pull. ANd they always get the subrepo whether they want it or not. A subrepo is physically a part of your repo; it just allows that part to be treated logically separate sometimes.

I do like the idea of not needing a protocol scheme. Like sometimes I see html urls for css and js that start // meaning either http or https. I am loathe to make ../foo.git not be a filesystem path, since that already works in subrepo like it does in git clone. Maybe /../foo.git for relative and //foo.git for path after domain (both based on parent remote).

Let's keep this open and work it out after 0.4.0 release (which is blocked on me, yes I haven't forgotten @grimmySwe :)

rcdailey commented 7 years ago

@ingydotnet Thanks for replying. I just want to clarify to avoid confusion: when I talk about push/pull, I'm referring to subrepo's push/pull, i.e. you want to grab latest changes from the separate, common repo or contribute changes back to it. In this case, the remote URL will differ.

Normal push/pull, I expect to work on the defined remotes for the repository, as the copy of the subrepo there is a "means to an end", i.e. it is not the authoritative source for the subrepo source code.

I hope that clarifies. I'm only concerned with workflows involving the subrepo subcommand. Those seem to require remote URLs different from the repository's remotes (defined in config). With a submodule, I can modify the config to add remotes beyond the initial remote as-needed, so that I can push/pull to different repositories as I please. With subrepo, the initial remote is versioned (in the .gitrepo file) but I don't see a way to add other remotes to it, this is where I'm struggling. I don't see the flexibility with remotes like I could do with submodules.

This is why I was asking earlier to learn more about the expected workflow for subrepo. Again to compare submodules, each change I made that I wanted to collaborate back to the submodule's remote repository I did a git push on. With subrepo, how often am I expected to do git subrepo push and git subrepo pull?

Again thanks to everyone for helping me out. I am learning this but also not finding much about this in the documentation. Could be also that I'm just missing something obvious. Thanks for your patience!

dhuantes commented 7 years ago

@rcdailey I'm not part of the core development team but I have used subrepo since January of this year and used Submodules for several years prior to that. As @ingydotnet mentioned there may be a disconnect here. Having coming from the Submodules world, it's tough to break what has been ingrained so I will try to help.

subrepo is a script/set of scripts not a core part of git. So everything it does you can do by hand yourself but believe me you don't want to go there. If so you can join @ingydotnet et.al. in maintaining and testing subrepo.
The huge advantage of subrepo over submodules is that the project with subrepos (loosely equivalent to the super project with submodules) is a single repository. Anyone checking the super project out would never know that it was created using subrepo so those developers can commit/push/pull/fetch etc. just like any repository. Other developers, which sounds like you, will want to push potential changes upstream to a "stand alone" subrepo repository and pull back as needed. The is done with git subrepo push and git subrepo pull.
The nice thing with submodules is that each folder that is a submodule is actually the repository itself. So, as you indicated you can execute git commands against that repository and any 1..n remotes. With subrepo the subrepo is actually NOT a standalone repository. It is a subdirectory within the larger repository and git subrepo only knows it's a subrepo via the .gitrepo file that is in that subdirectory.
The .gitrepo file is used to inform git subrepo commands to know where to push to or pull from when either git subrepo push or git subrepo pull are called. But because the subdirectory is not an actual repository (as far as git is concerned) remotes are NOT supported with the subdirectory as they are with Submodules.
I think the speed and cleanliness of subrepo make it a far superior choice to Submodules but as you indicate the workflow for pushing back up stream needs to be established by your team. The mechanics are provide by subrepo but execution is still on the developer/development team.
You ask, "How often am I expected to do git subrepo push and git subrepo pull?".... This is completely up to you and your team. In my team we only make subrepo's of common use library's/components. If you expect to do this on a regular basis it is good practice to isolate your commits. Git, of course, doesn't really care one way or the other but by isolating your commits when you do push upstream your comments won't include comments for the project that is using the subrepo. Hope that makes sense. Subrepo will handle only pushing those commits that pertain to the subrepo but it won't parse the comments.
When I began using subrepo, I exclusively used git subrepo clone from the master branch and only git subrepo pull'd and git subrepo push'd to and from the master branch. My mindset was that if I was making a repository a subrepo it should be stable and if it's stable the master branch is what I want. I'm rethinking that now and beginning to git subrepo push back to branches (despite git subrepo clone'ing from master). That has it's own issues and beyond the scope of this thread.....at least for now.
If you want/need to update other instances (e.g. remotes other than origin) of the common repository that is the subrepo, IMHO you really need to have the repository checked out as a standalone repository. With that repository, you can setup all the remotes you want and you when a git subrepo push is executed on a super project repository with using that subrepo just do a git pull <remote_name> or git fetch <remote_name>.....git merge <remote_name>/branch with your standalone copy as you see fit and git push <remote_name> <branch> as needed.

Sorry for the long winded response but I've been where you are and it's not intuitively obvious when making the transition from submodules to subrepo that despite accomplishing similar goals they are completely different implementations. Hope this helped. Thanks again to @ingydotnet @grimmySwe et. al. for sharing subrepo with the community at large.

ingydotnet commented 7 years ago

@dhuantes thanks for the detailed writeup. We should use it as part of a subrepo tutorial in the future.

One really good thing to keep in mind with subrepo, is that nobody on your project needs to know about or install subrepo. They can still change the content of the subrepo, because it is literally part of the main repo. Only only person needs to be able to subrepo pull or push. After a subrepo pull is done, then everyone just pulls the main repo (again because a subrepo pull is just a normal change to the main repo). Hope that was a useful addition...

ingydotnet / git-subrepo

Relative URL for git subrepo #298