Open targos opened 5 months ago
Similar to https://github.com/nodejs/build/issues/3288
I logged into the backup server and ran /root/backup_scripts/remove_old.sh ci.nodejs.org
.
It freed 100GB.
It happened again.
@ryanaslett Maybe the new backup server is not setup to run the cleanup script regularly?
Hmm. Its setup in the crontab:
40 23 * * 6 /usr/bin/rsnapshot -c /usr/local/etc/rsnapshot.conf weekly && /root/backup_scripts/remove_old.sh ci-release.nodejs.org && /root/backup_scripts/remove_old.sh ci.nodejs.org
It should be clearing it out once a week.
The backup server lacks any kind of monitoring or alerting if those tasks do not succeed for whatever reason, so we should probably come up with a strategy to be notified if those crons fail for whatever reason.
In that case, I think the problem is clear.
Running remove_old.sh ci-release.nodejs.org
ends up with an error:
# /root/backup_scripts/remove_old.sh ci-release.nodejs.org
curl: (92) HTTP/2 stream 1 was not closed cleanly before end of the underlying stream
# echo $?
92
So the script never gets a chance to be executed for ci.nodejs.org
remove_old.sh ssh'es into ci, and ci-release and blows away any jobs older than 22 days, then triggers a jenkins reload to recognize the jobs are missing.
The credentials for jenkins were for a jenkins user jbergstroem
jbergstroem is missing the Overall/Read permission
Is the error given.
Not sure when they were removed from the Nodejs/build github team, but thats the last time this script probably executed successfully.
I've replaced the credentials with an API token for my account for now and have ran it for ci, but Im not sure how jbergstroem had one api key that worked with both ci and ci-release (maybe moved it over to release from ci somehow?)
The cron will currently delete the jobs on ci, and refresh and then delete the jobs on ci-release and then fail to refresh because its using the same api token.
This should probably be a service account with proper permissions to access the /reload path.
OTOH, this seems like a brittleway to avoid using jenkins own job cleanup mechanism:
My recommendation is that we change the jobs on the release server first (since theres only a handful) and remove this cleanup mechanism from ci-release first, and then modify the jobs on ci.nodejs.org to also clean up after themselves.
I'm looking into it