Open pzread opened 1 year ago
Confirmed that it's related to the google-guest-agent
. I tried to apt-get install google-guest-agent
first and it failed to do that:
Jun 02 18:02:40 github-runner-template-cpu-2023-06-02-1685728901 google_metadata_script_runner[1953]: startup-script: + apt-get install google-guest-agent
Jun 02 18:02:41 github-runner-template-cpu-2023-06-02-1685728901 google_metadata_script_runner[1953]: 2023/06/02 18:02:41 logging client: rpc error: code = Unauthenticated >
Jun 02 18:02:47 github-runner-template-cpu-2023-06-02-1685728901 google_metadata_script_runner[1953]: startup-script: (Reading database ... 64301 files and directories curr>
Jun 02 18:02:47 github-runner-template-cpu-2023-06-02-1685728901 google_metadata_script_runner[1953]: startup-script: Preparing to unpack .../google-guest-agent_20220622.00>
Jun 02 18:02:47 github-runner-template-cpu-2023-06-02-1685728901 systemd[1]: google-startup-scripts.service: Main process exited, code=killed, status=15/TERM
Jun 02 18:02:47 github-runner-template-cpu-2023-06-02-1685728901 systemd[1]: google-startup-scripts.service: Failed with result 'signal'.
Seems like you updated the base image so it's got the up to date guest agent: https://github.com/openxla/iree/pull/13918. Not sure how we avoid this in the future other than just bumping that again. Some searching indicates others who've encountered similar issues but no resolutions. We should probably be updating the base image when we make new VM images anyway. For reproducibility of the existing image, the image_setup.sh
script still works: you just can't run it as a startup script. One option would be for the script to try to invoke itself (with disown?) after doing an upgrade so that it can continue even if upgrading kills it. A bit tricky because I don't think the script actually lives in any file when used as a startup script.
Maybe documenting this (remember to bump the base image) somewhere will be good enough. I spent a while trying to figure out what happened.
Yeah a comment seems worth it at least. One option would be to add an exit trap that's started right before the apt-get upgrade
command and ended right after. Then if the script gets killed in there, it can at least provide a helpful message
create_image.sh
fails to create a new image from the base imageubuntu-2204-jammy-v20230114
. From the log, it seems like theapt-get upgrade
updated the packagegoogle-guest-agent
, and it killed thegoogle-startup-scripts.service
during the process, which interrupted the setup script.google-startup-scripts.service
logs: