pulp / pulpcore

Pulp 3 pulpcore package https://pypi.org/project/pulpcore/
GNU General Public License v2.0
289 stars 113 forks source link

As a user, I want to sync a git repository without PULP_MANIFEST #4708

Open lubosmj opened 10 months ago

lubosmj commented 10 months ago

GitHub and GitLab

It is possible to sync a GitHub repository by leveraging the GitHub API: https://api.github.com/repos/pulp/pulpcore/git/trees/main?recursive=1

Similarly, for GitLab: https://docs.gitlab.com/ee/api/repository_files.html.

The response lists paths, sha256, and the size of each file. With this, we can sync remote repositories.

Problems

Git

For now, we can use gitpython to get the information needed for syncing, like so:

import git

def list_files_info(repo_path):
    repo = git.Repo(repo_path)

    # Iterate through all commits in the repository
    for commit in repo.iter_commits():
        # Iterate through all the files in the commit
        for item in commit.tree.traverse():
            # Print path, digest, and size information
            print(f"Path: {item.path}, Digest: {item.hexsha}, Size: {item.size} bytes")

# Example usage
repo_path = "/path/to/your/local/repo"
list_files_info(repo_path)

Problems

mdellweg commented 10 months ago

I believe, we can create a git remote that will point to a git repo and have a git-ref that is a commit, a branch or a tag. Sync first stage will then make a bare clone and use gitpython to extract the files from that revision and feed them into the sync pipeline. Without the need of a checkout tree.