twosigma / git-meta

Repository for the git-meta project -- build your own monorepo using Git submodules
http://twosigma.github.io/git-meta
BSD 3-Clause "New" or "Revised" License
216 stars 50 forks source link

[Feature Request] Intelligent Usage with monorepo tools (Like Nrwl Nx) #817

Open wSedlacek opened 3 years ago

wSedlacek commented 3 years ago

Context

I recently had the thought of overlaying git repos for the needs of a project I am working on and I can across this project. I have worked with git submodules before but they were clunky to mange commits across repos. I want to use this with Nrwl Nx and take heavy advantage of it's dependency graph and boundaries.

Request

After looking over git meta it seems super cool, but with large repos it seems like a lot of manual work of opening repos and closing them especially when they have a large number of dependencies. This being said I think combining the work done in git meta with a mono repo management tool like like Nwrl Nx could be extremely powerful.

Specifically I am looking for a CLI that leverages both git meta and Nrwl Nx to provide a simple set of commands that anyone on a team can easily pick up and be effective with but have enough power for Dev Ops and organization to have full control over their code base.

Here are some of my thoughts on what commands would be specifically useful.

// `git nx` could be replaced by another name, it was simply used as a placeholder for these examples
// <project-name> refers to the name of the project in the mono repo's schema

// Checkout a specific app or lib to disk including it's dependencies
git nx open <project-name>

// Checkout a specific app or lib to disk including not only it's depends but anything that depends on it (used for testing if a change to a specific lib broke something downstream)
git nx evaluate <project-name>

// Remove a specific app or lib from disk including any dependencies that are not used by other checked out projects
git nx close <project-name>

// Closes all open projects
git nx clear

// Open all projects
git nx flatten

// Reevaluates all open projects, any unused dependencies are closed, any unopenned dependencies are opened 
git nx update

// Creates a new submodule out of a given project with the remote of of a specified git url (optional), could possibly even use the `gh` or `gitlab` CLI to create the repo automatically, the `--private` flag can be used to create a private repo
git nx split <project-name> <git-url> (--private)

// Removes a project from a submodule joining it back into the meta/root repo, private repos will be blocked from using this command unless the `--force` flag is specified
git nx join <project-name> (--force)

// Converts a given project and all it's dependencies to public (gh/gitlab integration)
git nx publish <project-name>

// Converts a given project and all it's dependencies to private (gh/gitlab integration)
git nx unpublish <project-name>

// Configure default organization
git nx config org <organization>

// Configure default git provider (gh/gitlab)
git nx config provider <provider>

Given the nature of this request being an integration it very well may be out of scope for this project. That being said I do think git meta is the underlying ground work for such a CLI to be based off of and I would be interested in hearing any thoughts on how this might be done or what other solutions there might be to this problem.

User Stories

As a developer in a large mono repo, I want an easy way to get only the code I need to work on without unnecessary distractions.

As a dev ops engineer of a large mono repo, I want my CI/CD engine to only fetch what is necessary for a given pipeline to run.

As an organization who works with both open and closed source projects, I want to be able to control what parts of my mono repo are available to the public without needing to burden my developers with context switching.

As a developer advocate, I want to get my team on board with mono repos without requiring extensive knowledge of our architecture for us to be effective with the paradigm.

novalis commented 3 years ago

These are all really cool ideas. Let me walk you through the reasons we don't already have some of these features, and then we can figure out if there are features that git-meta can add that would be useful to you.

At Two Sigma, we have a custom build system that, with the help of some git hooks, can find already-built artifacts that correspond to the version of the meta repo that you have checked out. So you don't need to open the code that you depend on. Of course, you do still need to open anything you want to change, and, in some cases, anything in between. That is, if 'horse' depends on 'shoe' and 'shoe' depends on 'nail', and you want to change nail and horse, you might need to manually open 'shoe'.

We also support remote builds and tests, so you usually don't need to open your dependees. You just run our remote build command, and it looks at the build graph to figure out what needs to be built.

So that's one reason we don't have tooling inside git-meta to open dependents and dependees. The other reason is that in our build system (like Bazel or Pants), dependencies are not stored centrally. Instead, they are local to each submodule. So if you request the transitive deps of 'horse', then first we would have to open 'horse', then 'shoe', then 'nail'. This is possible, but possibly slow. Our longest dependency chain is like 60 or 70 deep. Your repos may be simpler.

You also mention the possibility of having code directly in the meta repository. There is no theoretical reason that this could not be done. But right now, git-meta does not support it. I would be happy to review patches about this.

wSedlacek commented 3 years ago

Thanks for the feedback @novalis! As I thought about it more this idea really does sound like it's own project so I was thinking I would give it a shot this weekend. Your insight about how your team uses it was very interesting.

You mention dependencies being local to each submodule, I believe this might be the case for Nx as well as it reads the import statements to determine these, at least for ones that are not implicit. Maybe it would be possible to build some central table to keep track of these? I was also thinking it would be nice to be able to just add an import statement and have it automatically open the repo you need. Maybe through a VS Code plugin with an on save hook?

You mentioned reusing artifacts for builds, Nx has this capability too and is a really good point. Perhaps a check could be done for artifacts either locally or on the Nx Cloud to determine if the library has an artifact or not, and only if it doesn't have an artifact then it would be needed. It seems like there is a lot more to consider I realized on my initial pass.

In any case, the features from git-meta that would be important for me as I hash out this idea would be

novalis commented 3 years ago

We hadn't really thought about having a public interface -- our aim was to, as much as possible, replicate git itself, where the interface is the command-line. Of course, git is written in C, so its startup time is negligible.

My inclination would be to keep the public interface as small as possible, and have it operate as if it were using the command-line I recognize that this is somewhat annoying, because you lose all of the of type information. But it also means that we don't have to worry about exposing our internal interfaces (which often involve icky libgit2 objects for which the memory management is somewhat broken). At most, I would like to expose structured versions of the internal interfaces (e.g. the interface for git meta status could return three lists of objects rather than one string). But it would still be operated through command-line options.

Anyway, patches accepted. We're not likely to implement this ourselves, because we're focused on the command-line experience, but we are happy to help review patches and answer questions.