metomi / rose

:rose: Rose is a toolkit for writing, editing and running application configurations.
https://metomi.github.io/rose/
GNU General Public License v3.0
56 stars 53 forks source link

Git file install: support for file creation from Git repositories #2754

Closed benfitzpatrick closed 5 months ago

benfitzpatrick commented 10 months ago

Work in progress!

Aims to address #1419.

I'm not that happy with the configuration syntax, although I am happy with the three main elements of it as elements.

The mechanism itself uses filtering and either partial clones or proper sparse checkouts where the Git version allows.

wxtim commented 10 months ago

Let us know when you want us to have a review at this.

benfitzpatrick commented 10 months ago

Ready for review, not ready to go in IMO!

oliver-sanders commented 9 months ago

Ready for review, not ready to go in IMO!

@benfitzpatrick, what's left to do?

benfitzpatrick commented 9 months ago

I'd like to do a bit of in-anger user-case testing if we feel it's close enough as-is? Shouldn't take too long.

benfitzpatrick commented 8 months ago

I've done some testing against real cases I could find in our workflows, found and fixed some issues in afddb2b.

I tested a case with 6 GitHub file installs in a single rose-app.conf - we were worried about hitting a rate limit. It worked fine and looked like these happen sequentially, at least from the verbose output? Happy to demo offline.

Otherwise I'm happy

oliver-sanders commented 8 months ago

looked like these happen sequentially

Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.

benfitzpatrick commented 7 months ago

Little bump

oliver-sanders commented 7 months ago

looked like these happen sequentially

Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.

After a bit of digging I'm happy that these operations are being run concurrently, resulting in their subprocesses being run in parallel.

To track subprocesses, try the following diff:

diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 8993fb93..8ec00c94 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -372,6 +372,7 @@ class RosePopener:
                 proc = Popen(args[0], **kwargs)
             else:
                 command = ' '.join(map(shlex.quote, args))
+                print(f'# {command}')  # process started
                 proc = await asyncio.create_subprocess_shell(command, **kwargs)
         except OSError as exc:
             if exc.filename is None and args:
@@ -392,7 +393,9 @@ class RosePopener:
         if isinstance(kwargs.get("stdin"), str):
             stdin = kwargs.get("stdin")
         stdout, stderr = await proc.communicate(stdin)
+        command = ' '.join(map(shlex.quote, args))
         await proc.wait()
+        print(f'$ {command}')  # process returned
         return proc.returncode, stdout, stderr

     async def run_ok_async(self, *args, **kwargs):

Using this diff with the following config:

[command]
default=true

[file:cylc/scheduler.py]
source=git:git@github.com:cylc/cylc-flow.git::cylc/flow/scheduler.py::master

[file:cylc/task_proxy.py]
source=git:https://github.com/cylc/cylc-flow.git::cylc/flow/task_proxy.py::master

[file:cylc/async_util.py]
source=git:https://github.com/cylc/cylc-flow.git::cylc/flow/async_util.py::master

I get this output showing up to three subprocesses running simultaneously:

# git --git-dir=/var/tmp/tmpqqkj_tvk/.git init
# git --git-dir=/var/tmp/tmpd4c71k8x/.git init
# git --git-dir=/var/tmp/tmp50f2vgwd/.git init
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git init
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git remote add origin https://github.com/cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git init
# git --git-dir=/var/tmp/tmp50f2vgwd/.git remote add origin https://github.com/cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git init
# git --git-dir=/var/tmp/tmpd4c71k8x/.git remote add origin git@github.com:cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git remote add origin https://github.com/cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git remote add origin https://github.com/cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmp50f2vgwd/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git remote add origin git@github.com:cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmpd4c71k8x/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git --work-tree=/var/tmp/tmpqqkj_tvk checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmp50f2vgwd/.git --work-tree=/var/tmp/tmp50f2vgwd checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git --work-tree=/var/tmp/tmp50f2vgwd checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmp50f2vgwd/cylc/flow/async_util.py /var/tmp/tmph9lc16hm/f13f5dacdb0f632826821bf9278a10fc
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git --work-tree=/var/tmp/tmpqqkj_tvk checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpqqkj_tvk/cylc/flow/task_proxy.py /var/tmp/tmph9lc16hm/759c658589e37a9af232bfa8e03d01b0
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpqqkj_tvk/cylc/flow/task_proxy.py /var/tmp/tmph9lc16hm/759c658589e37a9af232bfa8e03d01b0
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmp50f2vgwd/cylc/flow/async_util.py /var/tmp/tmph9lc16hm/f13f5dacdb0f632826821bf9278a10fc
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmpd4c71k8x/.git --work-tree=/var/tmp/tmpd4c71k8x checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git --work-tree=/var/tmp/tmpd4c71k8x checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpd4c71k8x/cylc/flow/scheduler.py /var/tmp/tmph9lc16hm/8a367e85378d4334d5133fa065d71116
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpd4c71k8x/cylc/flow/scheduler.py /var/tmp/tmph9lc16hm/8a367e85378d4334d5133fa065d71116

You can also test it like this:

diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 8993fb93..35bdbe27 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -371,7 +371,8 @@ class RosePopener:
             if kwargs.get("shell"):
                 proc = Popen(args[0], **kwargs)
             else:
-                command = ' '.join(map(shlex.quote, args))
+                # command = ' '.join(map(shlex.quote, args))
+                command = 'sleep 5'
                 proc = await asyncio.create_subprocess_shell(command, **kwargs)
         except OSError as exc:
             if exc.filename is None and args:

The time taken should increase linearly as the sleep is increased, but only see a small increase (parsing overheads) when more files are configured for installation.

benfitzpatrick commented 7 months ago

Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.

After a bit of digging I'm happy that these operations are being run concurrently, resulting in their subprocesses being run in parallel.

Ah, I know why I thought they were sequential - the parsing is serial, the actual pulling can be concurrent. Sorry-thanks!

wxtim commented 5 months ago

I've opened #2784 - it looks like the bug I found is unrelated to this work.

oliver-sanders commented 5 months ago

Whilst working on #2785, I've noticed that "cannot connect to" type errors appear to surface with the following message:

ConfigProcessError: file:README.md=source=git:https@github.com:metomi/rose::README.md::master: ls-remote: could not find ref 'master' in 'git@github.com:metomi/rose'

Where the underlying cause here is that GitHub SSH access has not been configured.

benfitzpatrick commented 5 months ago

I could do a "or remote not contactable?" on the end of the error message?

oliver-sanders commented 5 months ago

If it's possible to differentiate between these cases, great, but if not that'll be fine.

oliver-sanders commented 5 months ago

Waiting for tests to pass again, then ready to merge.

oliver-sanders commented 5 months ago

Good news, the tests pass.

Bad news, if some of the tests skip, then the test fails, e.g:

diff --git a/t/rose-app-run/28-git.t b/t/rose-app-run/28-git.t
index 1b4eaaa90..92a411611 100644
--- a/t/rose-app-run/28-git.t
+++ b/t/rose-app-run/28-git.t
@@ -72,12 +72,12 @@ remote_locations=("$HOSTNAME:$TEST_DIR/hellorepo/" "http://localhost:$GIT_WS_POR
for i in 0 1 2; do
     remote_mode="${remote_test_modes[$i]}"
     remote="${remote_locations[$i]}"
-    if [[ "$remote_mode" == "ssh" ]] && ! ssh -n -q -oBatchMode=yes $HOSTNAME true 1>'/dev/null' 2>/dev/null; then
+    if true; then
         skip 14 "cannot ssh to localhost $HOSTNAME"
         echo "Skip $remote" >/dev/tty        
         continue
     fi
-    if [[ "$remote_mode" == "http" ]] && ! curl --head --silent --fail $remote >/dev/null 2>&1; then
+    if true; then
         skip 14 "failed to launch http on localhost"
         echo "Skip $remote" >/dev/tty        
         continue

:(

[FAIL] ../config: rose-app.conf not found.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 15/57 subtests 
    (less 42 skipped subtests: 0 okay)
Test Summary Report
-------------------
t/rose-app-run/28-git.t (Wstat: 256 Tests: 42 Failed: 0)
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 57 tests but ran 42.

Not a release blocker. Suggest bumping this to future work.

~Opened: https://github.com/metomi/rose/issues/2789~

oliver-sanders commented 5 months ago

@MetRonnie, could you check the last couple of commits and merge if you're happy with the tests being pushed to a bugfix release milestone.

oliver-sanders commented 5 months ago

Brill, that's fixed it (will close issue). Tested with all possible skip combinations, test passed each time.

oliver-sanders commented 5 months ago

(Mac OS tests failing due to gh-actions DNS issues, safe to ignore)

oliver-sanders commented 5 months ago

(the test fixes would have been on a bugfix release milestone, however, there is no need for a post-fix, Ben has sorted it on this PR)