kcl-lang / kpm

KCL Package Manager
https://kcl-lang.io
Apache License 2.0
27 stars 44 forks source link

[LFX Term 3]: Full storage path for KCL third-party dependencies #384

Open zong-zhe opened 4 months ago

zong-zhe commented 4 months ago

1. Problem

2. Solution

In order to solve the problem, we redesigned the storage local system of the third-party dependencies with reference to rust

└── kpm
    ├── .kpm # all the configuration file for kpm client
    ├── git # all the kcl dependencies from git repo
    ├── oci # all the kcl dependencies from oci registry

2.1 Git

Under the subdir in kpm/git, the storage local system is

kpm/git
├── checkouts.   # checkout the specific version of git repository from cache bare repository 
│   ├── kcl-2a81898195a215f1
│   │   └── 33bb450. . # All the version of kcl package from git repository will be replaced with commit id
│   ├── kcl-578669463c900b87
│   │   └── 33bb450
└── db    # A bare git repository for cache git repo
    ├── kcl-2a81898195a215f1.      # <NAME>-<HASH> <NAME> is the name of git repo, 
    ├── kcl-578669463c900b87.   # <HASH> is calculated by the git full url.

NOTE:

2.2 Oci

Under the subdir in kpm/oci, the storage local system is

kpm/oci
├── cache # the cache for KCL dependencies tar
│   ├── ghcr.io-2a81898195a215f1    # <HOST>-<HASH> HOST is the name of oci registry,  <HASH> is calculated by the oci full url.
│   │   └── k8s_1.29.tar    # the tar for KCL dependencies
│   ├── docker.io-578669463c900b87
│   │   └── k8s_1.28.tar
└── src                                                 
│   ├── ghcr.io-2a81898195a215f1
│   │   └── k8s_1.29    # the KCL dependencies tar will untar here
│   ├── docker.io-578669463c900b87
│   │   └── k8s_1.28

4. Implementation

zong-zhe commented 4 months ago

PreTest

Manoramsharma commented 3 months ago

HI @zong-zhe @Peefy I am excited to contribute to this issue. After going through the solution mentioned I have see following key take-aways:

  1. By generating unique hash values based on repository URLs, we can distinguish dependencies with identical names. This method not only avoids conflicts but also provides a consistent way to reference dependencies.
  2. Developing a system that abstracts the underlying details of fetching and caching dependencies. This system would expose a Get method, ensuring that the file system automatically manages caching and downloads. This would also standardize behavior across different operating systems.

In order to proceed forward with the pretest where I need to expand kpm's ability to download , chekout from a git bare repo. Which part of documentation I can refer for configuring this for kpm current git module?

zong-zhe commented 3 months ago

Hi @Manoramsharma 😃

Welcome! Let me provide more details.

kpm currently uses a three-party library go-getter to complete git related work, go-getter will call your local git tool to complete the function, which means that you need to install git tools in your local, and refer to the git tool documentation to complete the configuration. With kpm, you don't need to do any additional configuration, and if your git clone runs successfully, kpm can add dependency with the git repo.

kpm and git

kpm and go-getter

Manoramsharma commented 3 months ago

Okay, so kpm initially supported only the public git repo, now after the implementation of go-getter kpm supports cloning options for performing different git related operations. Right?

Now for our case, development of kpm current git module to expand its ability to download git bare repo do I need to modify the current git implementation by adding specific functions to the go-getter or the related file that will ultimately supports the checkout of specific version of git repository from cache bare repository? For the checkouts we will mention here:

kpm/git
├── checkouts.   # checkout the specific version of git repository from cache bare repository 
│   ├── kcl-2a81898195a215f1
│   │   └── 33bb450. . # All the version of kcl package from git repository will be replaced with commit id
│   ├── kcl-578669463c900b87
│   │   └── 33bb450
└── db    # A bare git repository for cache git repo
    ├── kcl-2a81898195a215f1.      # <NAME>-<HASH> <NAME> is the name of git repo, 
    ├── kcl-578669463c900b87.   # <HASH> is calculated by the git full url.

@zong-zhe am I thinking in a right direction?

Manoramsharma commented 3 months ago

Proposal to Enhance kpm's current git module

To enhance the kpm current git module, I propose the following solutions to expand its capabilities:

1. Expand Ability to Download Git Bare Repositories

1.1 Define a Function for Cloning Bare Repositories

I will add a function to clone Git repositories as bare repositories. This function will use Git commands to perform the cloning and store the bare repository in the appropriate cache directory.

Example Implementation:

package git

import (
    "fmt"
    "os/exec"
    "path/filepath"
)

// CloneBareRepo clones a Git repository as a bare repository.
func CloneBareRepo(repoURL string, cacheDir string) (string, error) {
    repoHash := calculateHash(repoURL) // Function to calculate a unique hash for the repo URL
    repoName := extractRepoName(repoURL) // Function to extract repository name from URL
    cachePath := filepath.Join(cacheDir, fmt.Sprintf("%s-%s", repoName, repoHash))

    if _, err := exec.LookPath("git"); err != nil {
        return "", fmt.Errorf("git not found in PATH")
    }

    if err := exec.Command("git", "clone", "--bare", repoURL, cachePath).Run(); err != nil {
        return "", fmt.Errorf("error cloning repository: %v", err)
    }

    return cachePath, nil
}

// Helper functions
func calculateHash(url string) string {
    // Implement hashing function for URL
}

func extractRepoName(url string) string {
    // Implement repo name extraction
}

1.2 Integrate with the Unified Dependency System

I will ensure that this new function integrates with the unified dependency system. This may involve updating the system to manage and utilize the cache effectively.

package unified

// UnifiedDependencySystem is an abstraction for managing dependencies.
type UnifiedDependencySystem struct {
    cacheDir string
}

// Get handles fetching the Git repository and managing the cache.
func (uds *UnifiedDependencySystem) Get(repoURL string, commitID string, targetDir string) error {
    bareRepoPath, err := git.CloneBareRepo(repoURL, uds.cacheDir)
    if err != nil {
        return err
    }
    return git.CheckoutFromBareRepo(bareRepoPath, commitID, targetDir)
}

2. Expand Ability to Checkout Source Code from Bare Repositories

2.1 Define a Function for Checking Out Code from Bare Repositories

I will add a function to check out a specific commit or branch from a bare repository.

Example Implementation:

package git

import (
    "fmt"
    "os"
    "os/exec"
    "path/filepath"
)

// CheckoutFromBareRepo checks out a specific commit from a bare repository.
func CheckoutFromBareRepo(bareRepoPath string, commitID string, targetDir string) error {
    if err := os.MkdirAll(targetDir, 0755); err != nil {
        return fmt.Errorf("error creating target directory: %v", err)
    }

    cmd := exec.Command("git", "worktree", "add", targetDir, commitID)
    cmd.Dir = bareRepoPath
    if err := cmd.Run(); err != nil {
        return fmt.Errorf("error checking out commit: %v", err)
    }

    return nil
}

2.2 Update the Unified Dependency System

I will ensure that the unified dependency system correctly utilizes the new checkout functionality and integrates with the local cache.

package unified

// UnifiedDependencySystem is an abstraction for managing dependencies.
type UnifiedDependencySystem struct {
    cacheDir string
}

// Get handles fetching and checking out the Git repository.
func (uds *UnifiedDependencySystem) Get(repoURL string, commitID string, targetDir string) error {
    bareRepoPath, err := git.CloneBareRepo(repoURL, uds.cacheDir)
    if err != nil {
        return err
    }
    return git.CheckoutFromBareRepo(bareRepoPath, commitID, targetDir)
}

3. Handle Cache and Metadata Management

3.1 Implement Cache Management

I will add functionality to manage and update cached repositories, including metadata.

package cache

// CacheManager handles operations related to caching dependencies.
type CacheManager struct {
    cacheDir string
}

// UpdateCache updates the local cache with new metadata and repositories.
func (cm *CacheManager) UpdateCache(repoURL string) error {
    // Implement caching logic here
    return nil
}

3.2 Ensure Consistency

I will ensure that the local storage system (e.g., kpm/git/checkouts/, kpm/git/db/) remains consistent with the new functionalities and can handle different scenarios.

cc: @Peefy @zong-zhe

zong-zhe commented 3 months ago

Okay, so kpm initially supported only the public git repo, now after the implementation of go-getter kpm supports cloning options for performing different git related operations. Right?

Now for our case, development of kpm current git module to expand its ability to download git bare repo do I need to modify the current git implementation by adding specific functions to the go-getter or the related file that will ultimately supports the checkout of specific version of git repository from cache bare repository? For the checkouts we will mention here:

kpm/git
├── checkouts.   # checkout the specific version of git repository from cache bare repository 
│   ├── kcl-2a81898195a215f1
│   │   └── 33bb450. . # All the version of kcl package from git repository will be replaced with commit id
│   ├── kcl-578669463c900b87
│   │   └── 33bb450
└── db    # A bare git repository for cache git repo
    ├── kcl-2a81898195a215f1.      # <NAME>-<HASH> <NAME> is the name of git repo, 
    ├── kcl-578669463c900b87.   # <HASH> is calculated by the git full url.

@zong-zhe am I thinking in a right direction?

Hi @Manoramsharma 😃

You might need to learn if go-getter offers the ability to download bare repo, but if not, you can extend this file https://github.com/kcl-lang/kpm/blob/main/pkg/git/git.go to add kpm's basic ability to download a git bare repo and checkout the bare repo to src code repo. You can try to design your implementation with CloneOptions.

zong-zhe commented 3 months ago

2.2 Update the Unified Dependency System

I will ensure that the unified dependency system correctly utilizes the new checkout functionality and integrates with the local cache.

package unified

// UnifiedDependencySystem is an abstraction for managing dependencies.
type UnifiedDependencySystem struct {
  cacheDir string
}

// Get handles fetching and checking out the Git repository.
func (uds *UnifiedDependencySystem) Get(repoURL string, commitID string, targetDir string) error {
  bareRepoPath, err := git.CloneBareRepo(repoURL, uds.cacheDir)
  if err != nil {
      return err
  }
  return git.CheckoutFromBareRepo(bareRepoPath, commitID, targetDir)
}

You may need to add some detail here, because KCL's dependencies are not only those from git, you also need to consider the dependencies from oci together, abstracting them from the same behavior.

You also need to consider combining the downloader to support the download behavior of different dependencies.

https://github.com/kcl-lang/kpm/blob/main/pkg/downloader/downloader.go

zong-zhe commented 3 months ago

3. Handle Cache and Metadata Management

3.1 Implement Cache Management

I will add functionality to manage and update cached repositories, including metadata.

package cache

// CacheManager handles operations related to caching dependencies.
type CacheManager struct {
  cacheDir string
}

// UpdateCache updates the local cache with new metadata and repositories.
func (cm *CacheManager) UpdateCache(repoURL string) error {
  // Implement caching logic here
  return nil
}

This part I think may not need to be a separate module, it can be used as part of the unified file system.

Manoramsharma commented 3 months ago

Hi @zong-zhe, I have carried out some research and found that go-getter just provides the support for downloading files from the source using URL and validating the URL to a correct type if required. Not something to facilitate bare repo cloning.

But our current git implementation includes go-git library functions like PlainClone, PlainOpen, InitOpen and others that can directly identify the repositoryto be bare or not-bare, by looking at the repository configuration and checks if worktree is nil.

Should I utilize these functions to extend our git implementaion for kpm adding support to clone bare repository? I need your reviews on this.

zong-zhe commented 3 months ago

Hi @Manoramsharma 😃

I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: https://github.com/kcl-lang/kpm/pull/335#discussion_r1616537022. If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.

  2. This method should be private. https://github.com/kcl-lang/kpm/blob/c9c58707c22ec0af60f83d2f480e4b1b777eed14/pkg/git/git.go#L113 It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

Gmin2 commented 3 months ago

Hey @zong-zhe so do i also need to refactor my previous work(done in #401 )

zong-zhe commented 3 months ago

Hey @zong-zhe so do i also need to refactor my previous work(done in #401 )

Yes 😃

Manoramsharma commented 3 months ago

Hi @Manoramsharma 😃

I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.
  2. This method should be private. https://github.com/kcl-lang/kpm/blob/c9c58707c22ec0af60f83d2f480e4b1b777eed14/pkg/git/git.go#L113 It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@zong-zhe are we talking of something similar to the cargo implementation that is package manager for RUST, because I can see in the discussion you mentioned about the clone options available with cargo package manager.

Or I can add an extra --flag to kcl mod add [options] .. command implementation that is taken as a CLI input from the user client and based upon the value of this flag can change out cloneOptions?

zong-zhe commented 3 months ago

Hi @Manoramsharma 😃 I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.
  2. This method should be private. https://github.com/kcl-lang/kpm/blob/c9c58707c22ec0af60f83d2f480e4b1b777eed14/pkg/git/git.go#L113

    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@zong-zhe are we talking of something similar to the cargo implementation that is package manager for RUST, because I can see in the discussion you mentioned about the clone options available with cargo package manager.

Or I can add an extra --flag to kcl mod add [options] .. command implementation that is taken as a CLI input from the user client and based upon the value of this flag can change out cloneOptions?

Hi @Manoramsharma 😃

What I mean is that the go-git may not be able to meet the current needs of kpm, because the go-git has defects in the processing of user credentials. I've discussed the defects of go-git in https://github.com/kcl-lang/kpm/pull/335#discussion_r1616537022

Gmin2 commented 3 months ago
  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.
  2. This method should be private. https://github.com/kcl-lang/kpm/blob/c9c58707c22ec0af60f83d2f480e4b1b777eed14/pkg/git/git.go#L113 It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

implemented it in #418

cc @zong-zhe

Manoramsharma commented 3 months ago

Hi @Manoramsharma 😃 I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.
  2. This method should be private. https://github.com/kcl-lang/kpm/blob/c9c58707c22ec0af60f83d2f480e4b1b777eed14/pkg/git/git.go#L113

    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@zong-zhe are we talking of something similar to the cargo implementation that is package manager for RUST, because I can see in the discussion you mentioned about the clone options available with cargo package manager. Or I can add an extra --flag to kcl mod add [options] .. command implementation that is taken as a CLI input from the user client and based upon the value of this flag can change out cloneOptions?

Hi @Manoramsharma 😃

What I mean is that the go-git may not be able to meet the current needs of kpm, because the go-git has defects in the processing of user credentials. I've discussed the defects of go-git in #335 (comment)

Hi @zong-zhe you can review my latest PR where I removed the go-git implementation and tried to implement the bare clone using the local git command inside the clone()itself , no extra function. I have searched a lot but was not able to see any supporting example of go-getter used to clone bare repo. see #419

Also I have one question regarding the checkout operation from the bare repo, the bare is something with no workingtree so how it is possible to checkout the particular commit or branch from the repository with no workingtree?

zong-zhe commented 3 months ago

Also I have one question regarding the checkout operation from the bare repo, the bare is something with no workingtree so how it is possible to checkout the particular commit or branch from the repository with no workingtree?

Hi @Manoramsharma 😃

This part of my description is inaccurate, which has caused a misunderstanding. I apologize for it. My original idea was to use the bare repository as a cache, then clone different dependencies into the corresponding directory from the bare repo and checkout to the corresponding branch, commit or tag.

Manoramsharma commented 3 months ago

Hi @zong-zhe after some research about the bare repository and its implementation as the cache support for our objective I have designed an overall proposal discussing the approach. Before moving forward with the exact changes to the codebase I have few questions in my mind:

  1. Which will be the remote bare repository we will be look up to?
  2. Bare repo supports connecting using ssh and the host IP (Discussed in the proposal), how to make sure a remote server running the code?

Looking forward to have your feedback on the same. Thanks for your support !!

zong-zhe commented 3 months ago

Hi @Manoramsharma 😃

  1. Which will be the remote bare repository we will be look up to?

kpm supports adding dependencies from git repositories, which are specified by the user.

  1. Bare repo supports connecting using ssh and the host IP (Discussed in the proposal)

kpm's git capability is directly inherited from the locally installed git client. As long as git clone is successful, kpm can successfully download it, and kpm does not need to consider ssh or host IP.

how to make sure a remote server running the code?

I don't understand the question.

Gmin2 commented 3 months ago

Why this issue was closed @zong-zhe, i was working on image

and merging the #419 PR conflict with merging my PR #418 which was merged previously

cc @Peefy @zong-zhe

Manoramsharma commented 3 months ago

Why this issue was closed @zong-zhe, i was working on image

and merging the #419 PR conflict with merging my PR #418 which was merged previously

cc @Peefy @zong-zhe

Hi @Gmin2 I think you are not following the issue quite well. I have already designed an overall roadmap for achieving this particular enhancement of KPM to manage third-party dependencies based on caching package repositories as bare repo to the local path.

All discussions including the overall approach to achieve a unified dependency management to low-level works in the form of pretest are already included in my proposal PR . You can have a look at the proposal here #423 . I am consistently working on the proposal refining and achieving the expected outcome by taking regular reviews from the maintainers on my work. I don't think you need to invest your time on this particular issue as I have already made a good progress on this.

Thanks

Gmin2 commented 3 months ago

Hi @Gmin2 I think you are not following the issue quite well. I have already designed an overall roadmap for achieving this particular enhancement of KPM to manage third-party dependencies based on caching package repositories as bare repo to the local path.

designing was not part of the pretest, it was not mentioned in the pretest i thought we had to give the design in the cover letter

Manoramsharma commented 2 months ago

Hi @zong-zhe I have designed a paperwork for the second phase to support downloading the packages to designed local storage path. You can review the approach and provide me with some valuable feedbacks here. Thanks 😄

cc: @Peefy