Closed benfitzpatrick closed 4 months ago
Let us know when you want us to have a review at this.
Ready for review, not ready to go in IMO!
Ready for review, not ready to go in IMO!
@benfitzpatrick, what's left to do?
I'd like to do a bit of in-anger user-case testing if we feel it's close enough as-is? Shouldn't take too long.
I've done some testing against real cases I could find in our workflows, found and fixed some issues in afddb2b.
I tested a case with 6 GitHub file installs in a single rose-app.conf - we were worried about hitting a rate limit. It worked fine and looked like these happen sequentially, at least from the verbose output? Happy to demo offline.
Otherwise I'm happy
looked like these happen sequentially
Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.
Little bump
looked like these happen sequentially
Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.
After a bit of digging I'm happy that these operations are being run concurrently, resulting in their subprocesses being run in parallel.
To track subprocesses, try the following diff:
diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 8993fb93..8ec00c94 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -372,6 +372,7 @@ class RosePopener:
proc = Popen(args[0], **kwargs)
else:
command = ' '.join(map(shlex.quote, args))
+ print(f'# {command}') # process started
proc = await asyncio.create_subprocess_shell(command, **kwargs)
except OSError as exc:
if exc.filename is None and args:
@@ -392,7 +393,9 @@ class RosePopener:
if isinstance(kwargs.get("stdin"), str):
stdin = kwargs.get("stdin")
stdout, stderr = await proc.communicate(stdin)
+ command = ' '.join(map(shlex.quote, args))
await proc.wait()
+ print(f'$ {command}') # process returned
return proc.returncode, stdout, stderr
async def run_ok_async(self, *args, **kwargs):
Using this diff with the following config:
[command]
default=true
[file:cylc/scheduler.py]
source=git:git@github.com:cylc/cylc-flow.git::cylc/flow/scheduler.py::master
[file:cylc/task_proxy.py]
source=git:https://github.com/cylc/cylc-flow.git::cylc/flow/task_proxy.py::master
[file:cylc/async_util.py]
source=git:https://github.com/cylc/cylc-flow.git::cylc/flow/async_util.py::master
I get this output showing up to three subprocesses running simultaneously:
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git init
# git --git-dir=/var/tmp/tmpd4c71k8x/.git init
# git --git-dir=/var/tmp/tmp50f2vgwd/.git init
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git init
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git remote add origin https://github.com/cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git init
# git --git-dir=/var/tmp/tmp50f2vgwd/.git remote add origin https://github.com/cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git init
# git --git-dir=/var/tmp/tmpd4c71k8x/.git remote add origin git@github.com:cylc/cylc-flow.git
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git remote add origin https://github.com/cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git remote add origin https://github.com/cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmp50f2vgwd/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git remote add origin git@github.com:cylc/cylc-flow.git
# git --git-dir=/var/tmp/tmpd4c71k8x/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmpqqkj_tvk/.git --work-tree=/var/tmp/tmpqqkj_tvk checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmp50f2vgwd/.git --work-tree=/var/tmp/tmp50f2vgwd checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmp50f2vgwd/.git --work-tree=/var/tmp/tmp50f2vgwd checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmp50f2vgwd/cylc/flow/async_util.py /var/tmp/tmph9lc16hm/f13f5dacdb0f632826821bf9278a10fc
$ git --git-dir=/var/tmp/tmpqqkj_tvk/.git --work-tree=/var/tmp/tmpqqkj_tvk checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpqqkj_tvk/cylc/flow/task_proxy.py /var/tmp/tmph9lc16hm/759c658589e37a9af232bfa8e03d01b0
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpqqkj_tvk/cylc/flow/task_proxy.py /var/tmp/tmph9lc16hm/759c658589e37a9af232bfa8e03d01b0
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmp50f2vgwd/cylc/flow/async_util.py /var/tmp/tmph9lc16hm/f13f5dacdb0f632826821bf9278a10fc
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git fetch --depth=1 origin 4b70574c9100e1cf830a226c254cb3159a558873
# git --git-dir=/var/tmp/tmpd4c71k8x/.git --work-tree=/var/tmp/tmpd4c71k8x checkout 4b70574c9100e1cf830a226c254cb3159a558873
$ git --git-dir=/var/tmp/tmpd4c71k8x/.git --work-tree=/var/tmp/tmpd4c71k8x checkout 4b70574c9100e1cf830a226c254cb3159a558873
# rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpd4c71k8x/cylc/flow/scheduler.py /var/tmp/tmph9lc16hm/8a367e85378d4334d5133fa065d71116
$ rsync -a '--exclude=.*' --timeout=1800 '--rsh=ssh -oBatchMode=yes -oStrictHostKeyChecking=no -oConnectTimeout=8' /var/tmp/tmpd4c71k8x/cylc/flow/scheduler.py /var/tmp/tmph9lc16hm/8a367e85378d4334d5133fa065d71116
You can also test it like this:
diff --git a/metomi/rose/popen.py b/metomi/rose/popen.py
index 8993fb93..35bdbe27 100644
--- a/metomi/rose/popen.py
+++ b/metomi/rose/popen.py
@@ -371,7 +371,8 @@ class RosePopener:
if kwargs.get("shell"):
proc = Popen(args[0], **kwargs)
else:
- command = ' '.join(map(shlex.quote, args))
+ # command = ' '.join(map(shlex.quote, args))
+ command = 'sleep 5'
proc = await asyncio.create_subprocess_shell(command, **kwargs)
except OSError as exc:
if exc.filename is None and args:
The time taken should increase linearly as the sleep is increased, but only see a small increase (parsing overheads) when more files are configured for installation.
Curious, I'll check that's because GitHub is serving the requests sequentially rather than Rose failing to run them in parallel.
After a bit of digging I'm happy that these operations are being run concurrently, resulting in their subprocesses being run in parallel.
Ah, I know why I thought they were sequential - the parsing is serial, the actual pulling can be concurrent. Sorry-thanks!
I've opened #2784 - it looks like the bug I found is unrelated to this work.
Whilst working on #2785, I've noticed that "cannot connect to" type errors appear to surface with the following message:
ConfigProcessError: file:README.md=source=git:https@github.com:metomi/rose::README.md::master: ls-remote: could not find ref 'master' in 'git@github.com:metomi/rose'
Where the underlying cause here is that GitHub SSH access has not been configured.
I could do a "or remote not contactable?" on the end of the error message?
If it's possible to differentiate between these cases, great, but if not that'll be fine.
Waiting for tests to pass again, then ready to merge.
Good news, the tests pass.
Bad news, if some of the tests skip, then the test fails, e.g:
diff --git a/t/rose-app-run/28-git.t b/t/rose-app-run/28-git.t
index 1b4eaaa90..92a411611 100644
--- a/t/rose-app-run/28-git.t
+++ b/t/rose-app-run/28-git.t
@@ -72,12 +72,12 @@ remote_locations=("$HOSTNAME:$TEST_DIR/hellorepo/" "http://localhost:$GIT_WS_POR
for i in 0 1 2; do
remote_mode="${remote_test_modes[$i]}"
remote="${remote_locations[$i]}"
- if [[ "$remote_mode" == "ssh" ]] && ! ssh -n -q -oBatchMode=yes $HOSTNAME true 1>'/dev/null' 2>/dev/null; then
+ if true; then
skip 14 "cannot ssh to localhost $HOSTNAME"
echo "Skip $remote" >/dev/tty
continue
fi
- if [[ "$remote_mode" == "http" ]] && ! curl --head --silent --fail $remote >/dev/null 2>&1; then
+ if true; then
skip 14 "failed to launch http on localhost"
echo "Skip $remote" >/dev/tty
continue
:(
[FAIL] ../config: rose-app.conf not found.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 15/57 subtests
(less 42 skipped subtests: 0 okay)
Test Summary Report
-------------------
t/rose-app-run/28-git.t (Wstat: 256 Tests: 42 Failed: 0)
Non-zero exit status: 1
Parse errors: Bad plan. You planned 57 tests but ran 42.
Not a release blocker. Suggest bumping this to future work.
@MetRonnie, could you check the last couple of commits and merge if you're happy with the tests being pushed to a bugfix release milestone.
Brill, that's fixed it (will close issue). Tested with all possible skip combinations, test passed each time.
(Mac OS tests failing due to gh-actions DNS issues, safe to ignore)
(the test fixes would have been on a bugfix release milestone, however, there is no need for a post-fix, Ben has sorted it on this PR)
Work in progress!
Aims to address #1419.
I'm not that happy with the configuration syntax, although I am happy with the three main elements of it as elements.
The mechanism itself uses filtering and either partial clones or proper sparse checkouts where the Git version allows.