Closed mhashizume closed 2 years ago
I think curl has builtin retry options for max number of times to retry and how much time to wait in between. I'm not sure what version of curl is installed in our docker images and when the options were added though
I think curl has builtin retry options for max number of times to retry and how much time to wait in between. I'm not sure what version of curl is installed in our docker images and when the options were added though
I was kind of unclear whether the issue was that curl was failing to download anything because the file wasn't available, or if it was grabbing a partially uploaded gem and the gem install
was failing (given the error message, I thought it was the latter).
I thought it was the latter
Ah based on this output, I think curl only downloaded 396 bytes:
100 396 100 396 0 0 1157 0 --:--:-- --:--:-- --:--:-- 1157
Maybe Cloudfront had a cache miss and replied with a redirect, so puppet.gem contains HTML. Probably best to pass -L
to follow redirects.
Makes sense! I've removed the loop and instead added --location
and --retry
options to the curl
command. It looks like the version of curl
that ships with the runner is plenty new to support the options we want.
I think the change should be done in all workflows, and if it solves the issue it should be done on all other core modules.
I would personally go back to the initial solution and retry both curl and gem install. Maybe with an always increasing sleep time (15 seconds from the beginning seemed quite a lot) and a message as such:
sleep_time=0
until [ $sleep_time -ge 15 ]
do
curl http://nightlies.puppet.com/downloads/gems/puppet${{ matrix.puppet_version }}-nightly/${{ matrix.gem_file }} --output puppet.gem
gem install puppet.gem -N && break
sleep_time=$((sleep_time*2+1))
echo "Retrying download and install of gem in $sleep_time seconds..."
sleep $sleep_time
done
@joshcooper, other than going forward with the current cleaner solution and keeping an eye on it for a while, I don't know how we could catch and check this for sure:
Maybe Cloudfront had a cache miss and replied with a redirect
Above seems safer to me.
Thank you for the feedback and code. I've updated my commit, please let me know if you have any additional comments.
Closing this pr in favor of https://github.com/puppetlabs/puppetlabs-sshkeys_core/pull/60. Added the location flag for both workflows.
We have seen transient failures with this job the last few days, possibly due to the timing of the gem being published. This adds a simple shell loop to retry downloading and installing the nightly gem.