Open ginglis13 opened 1 year ago
Thanks for the bringing this up! I am slightly confused, why does the x86 one work, but the arm64 one not work? Shouldn't they be based off the same underlying infra? In addition, our e2e tests on runfinch/finch works totally fine with sudo commands. See this example. I wonder if it has anything to do with this being a perl script that's run instead of a normal bash/zsh script
I am slightly confused, why does the x86 one work, but the arm64 one not work? Shouldn't they be based off the same underlying infra?
Yes they should be from what I can tell. This is the root of the issue, which I do not have a root cause for. Take a look at this recent execution of the Build
action: https://github.com/runfinch/finch-core/actions/runs/6010202363/job/16301161154
You can see the prompt for sudo is blocking. This has been consistent over the last 3months (at least from what I can see).
I wonder if it has anything to do with this being a perl script that's run instead of a normal bash/zsh script
maybe... but this is observed only on a specific macOS version/architecture, the perl script works fine on the others.
Did a quick test on this runner by changing the workflow to the following:
...
sudo echo hi
./bin/lima-and-qemu.pl
...
The runner gets stuck on ./bin/lima-and-qemu.pl
and not sudo echo hi
.
Hmm, also not a perl thing:
Run echo '#!/usr/bin/env perl' >> test.pl
echo '#!/usr/bin/env perl' >> test.pl
echo 'system("sudo echo sudoed")' >> test.pl
chmod u+x test.pl
./test.pl
shell: /bin/zsh {0}
env:
GO111MODULE: on
sudoed
It is due to this line sleep(1) until -s $filemonitor;
that the workflow hangs, not the use of sudo's password entry. @vsiravar can you take a look at why it's not correctly evaluating the size changes? From lima-and-qemu.pl
:
...
END { system("sudo pkill FileMonitor") }
system("sudo echo this-should-show"); # this shows up
print "sudo may prompt for password to run FileMonitor\n";
system("sudo -b /Applications/FileMonitor.app/Contents/MacOS/FileMonitor >$filemonitor 2>/dev/null");
system("sudo echo this-should-show"); # this shows up
sleep(1) until -s $filemonitor;
system("sudo echo this-probably-wont-show"); # this does not show up
...
The weird thing though is that it does not hang on a self-hosted runner provisioned manually. Log from a previous run. Does anything show up in the filemonitor.log when the workflow hangs?
No, I tried inserting a system("sudo cat $filemonitor");
, right before sleep, but nothing is displayed
can you take a look at why it's not correctly evaluating the size changes?
sleep(1) until -s $filemonitor;
is behaving as expected since $filemonitor is empty.
Did you also check if /Applications/FileMonitor.app/Contents/MacOS/FileMonitor
process is running after system("sudo -b /Applications/FileMonitor.app/Contents/MacOS/FileMonitor >$filemonitor 2>/dev/null");
.
96674 ?? 0:00.00 sudo -b /Applications/FileMonitor.app/Contents/MacOS/FileMonitor
96675 ?? 0:00.00 /Applications/FileMonitor.app/Contents/MacOS/FileMonitor
96676 ?? 0:00.00 sh -c ps -ax | grep FileMonitor
96678 ?? 0:00.00 grep FileMonitor
Update after troubleshooting: FileMonitor requires (or makes Terminal require) Full Disk Access. It is unclear why macOS 11 for x86 works.
The Build action has been consistently failing for the last month: https://github.com/runfinch/finch-core/actions/workflows/release.yaml
GitHub runners use passwordless sudo. However runners provisioned via finch infra don't allow passwordless sudo. (EDIT: this is observed consistently on the macOS 12 runner for arm64 https://github.com/runfinch/finch-core/actions/runs/6010202363/job/16301161154)
The log lines in step
Make and release deps
before timeout have been:This message is coming from https://github.com/runfinch/finch-core/blob/08a4ca2a9285f1dd2fac3bd4701087b1b2fdec87/bin/lima-and-qemu.pl#L46
Still looking to verify but the smoking gun is that the script is hanging on a prompt for password.
eOn my machine macOS Ventura 13.4 M1 chip: