Closed cvgs closed 1 year ago
@cvgs I still have some concerns about the way the exiting is designed now.
When pressing Ctrl-C during the preparation phase, erase-install.sh and swiftDialog are terminated, but startosinstall is not. To end it, I have to separately run sudo pkill startosinstall
. Is there a way to ensure that everything spawned by erase-install is killed if we cancel it?
Maybe related, but the 60 minute timeout with audible sound seems to continue run if the script is manually terminated, and also (I think) if you run the test-run through. One hour later, we get a message like this:
[erase-install] Timeout reached for PID 13831!
Unfortunately I don't fully understand how the exiting works now, so I don't know how to fix this. Can we also find a way that this PID is killed if we Ctrl-C the script, or if it's running in test mode?
PS. PRs can be made to the main
branch now. I'm not generally making rc
branches anymore, instead I maintain a release
branch which corresponds to the Latest release.
Hi Graham,
we launch startosinstall
via launchd
now, so we do not have an easy way to get it's process id. We could parse the output of launchctl list …
, but the man page states that one should not rely on the format of its output…
That's why the timeout is there, because during regular operation we just wait for startosinstall to talk to us using the SIGUSR1 mechanism. If nothing happens for one hour, we assume that something went wrong (perhaps startosinstall crashed?), log an unexpected error and exit. But in my experience, startosinstall sometimes takes longer than one hour to prepare the update, happily runs along doing its thing, and we get those log messages even if the update goes through - so perhaps it's a good idea to increase the timeout to 90 mins to reduce false negatives?
A slightly different case is when the script gets interrupted: Indeed, the script could ask the launchd job that it started to terminate during the finish()
or terminate()
routine, because the script knows which job label to target.
It simply would need to issue something like /bin/launchctl unload -F "/Library/LaunchDaemons/com.github.grahampugh.erase-install.startosinstall.plist"
(the same way launch_startosinstall
already does it to make sure we do not run twice).
I'm just not sure if it is safe to kill startosinstall at all times, so i simply left it alone (because in almost all cases, we want it to run and do its job). But i guess startosinstall should simply refuse to quit if it's right in the middle of updating the OS.
This change will cause erase-install to report a non-zero exit code (143 to be exact) when it is being abnormally terminated (e.g. by pressing CTRL+C or getting terminated by SIGTERM).
Previously it would return the exit code of the last command being executed at time of termination, which could be non-zero or zero depending on the specific circumstances, which then could have been reported as successful execution in a Jamf policy.
This change will make it easier to discover such errors. The exit code of the last executed command will be logged in addition to returning 143 to facilitate debugging.