siemens / kas

Setup tool for bitbake based projects
MIT License
353 stars 144 forks source link

Automatic caching of KAS_REPO_REF_DIR #49

Closed vivien closed 1 year ago

vivien commented 3 years ago

Kas should have a default value for KAS_REPO_REF_DIR and clone a repository there itself as a transparent step before checking out a refspec and eventually applying patches, because 1) it requires its own naming convention for these cached repos and 2) it knows the origin URLs of the said repos.

This would allow us to share KAS_REPO_REF_DIR alongside SSTATE_DIR and DL_DIR between multiple builds.

Cc: @rossburton

rossburton commented 3 years ago

Currently testing a little bit of Py to do this:

#! /usr/bin/env python3

import sys, os, subprocess, pathlib

def repo_shortname(url):
    from urllib.parse import urlparse
    url = urlparse(url)
    return ('{url.netloc}{url.path}'
            .format(url=url)
            .replace('@', '.')
            .replace(':', '.')
            .replace('/', '.')
            .replace('*', '.'))

repositories = (
    "https://git.yoctoproject.org/git/poky",
    "https://git.openembedded.org/meta-openembedded",
    "https://git.yoctoproject.org/git/meta-virtualization",
    "https://git.openembedded.org/meta-openembedded",
    "https://git.yoctoproject.org/git/meta-zephyr",
    "https://github.com/kraj/meta-clang",
)

if __name__ == "__main__":
    if "KAS_REPO_REF_DIR" not in os.environ:
        print("KAS_REPO_REF_DIR needs to be set")
        sys.exit(1)

    base_repodir = pathlib.Path(os.environ["KAS_REPO_REF_DIR"])
    for repo in repositories:
        repodir = base_repodir / repo_shortname(repo)
        if repodir.exists():
            subprocess.run(["git", "-C", repodir, "fetch"], check=True)
        else:
            subprocess.run(["git", "clone", "--bare", repo, repodir], check=True)

25% of the code is the URL-to-name logic so this really should be part of kas IMHO.

Automatically cloning/fetching the repositories if KAS_REPO_REF_DIR is set seems like a good idea, but I can see an argument that the reference directory might be a shared resource so racing to do the fetches would be bad.

A separate plugin so it can be invoked once in a multi-job CI run, maybe?

henning-schild commented 3 years ago

Maybe move this to the ML at some point. The motivation is pretty clear but it is another cache and caches tend to be kind of nasty, so testcases are a must. And even if not many people use mercurial, it should probably be covered as well.

vivien commented 3 years ago

This Github issue / ML dance makes no sense and must be stopped. This is an open-source project, not a biased manufacturer customer support platform where issues are half discussed. Either you guys disable Issues completely in the Github project settings and clearly point out the mailing list in the README, or you keep both and allow Github issues to be fully discussed.

Thank you. Cc: @jan-kiszka

henning-schild commented 3 years ago

I agree that the combination could seem weird. The mailinglist is pointed out in CONTRIBUTING.md and issues are open to invite people that might have missed that or are not yet familiar with an ML-process. Or maybe for quickly answering a drive-by user question.

Both @vivien and @rossburton have been active on the ML and should know that. So when you still open issues ... you started that round of the dance. The majority of the community probably looks at ML and not github, so yes anything here risks being "half discussed". But maybe you like it as a staging area before moving to the list ...

We prefer ML because we take OSS very serious and want to avoid github-lock-in.

Maybe issue templates can be used to point to the ML more prominently.

vivien commented 3 years ago

I use whatever tool is available and is closest to the source. If the upstream sources are hosted on github and issues are enabled, I first lookup and comment in there, simple. If you care about "people that might have missed that or are not yet familiar with an ML-process", do not provide them an alternative tool. Just disable the Github Issues and clearly state the ML in the README, which is displayed in the Github project front page. Keeping issues enabled to ask people to move the topic to the ML is still non-sense.

jan-kiszka commented 2 years ago

Was this (the original issue) covered on the list afterwards?

fmoessbauer commented 1 year ago

This is also one thing we need to better support kas-based builds in China. Fetching from github sometimes fails and is very slow. If we could cache it in CI, that would significantly speedup the CI task. And the issue is actually getting worse with repos getting bigger as kas does not do sparse checkouts.

Automatically cloning/fetching the repositories if KAS_REPO_REF_DIR is set seems like a good idea, but I can see an argument that the reference directory might be a shared resource so racing to do the fetches would be bad.

This could be solved by working on a "private" cache that is handled by the CI system (e.g. a gitlab-ci-cache artifact). After the build, this artifact could be re-generated. By that, we would have a fully transparent solution.

PS: I cannot find the discussion on the ML. Maybe somebody could add a reference here.

jan-kiszka commented 1 year ago

Feature was merged.

rossburton commented 1 year ago

Merged in f2560588bc67eb44ac715ca2fee0bf785e223591

rossburton commented 1 year ago

I'll followup at some point with a proposal or plugin to run git fetch on the clones.