puppetlabs / puppetlabs-sshkeys_core

Apache License 2.0
4 stars 32 forks source link

(maint) Adds retry to nighty gem unit tests #57

Closed mhashizume closed 2 years ago

mhashizume commented 2 years ago

We have seen transient failures with this job the last few days, possibly due to the timing of the gem being published. This adds a simple shell loop to retry downloading and installing the nightly gem.

joshcooper commented 2 years ago

I think curl has builtin retry options for max number of times to retry and how much time to wait in between. I'm not sure what version of curl is installed in our docker images and when the options were added though

mhashizume commented 2 years ago

I think curl has builtin retry options for max number of times to retry and how much time to wait in between. I'm not sure what version of curl is installed in our docker images and when the options were added though

I was kind of unclear whether the issue was that curl was failing to download anything because the file wasn't available, or if it was grabbing a partially uploaded gem and the gem install was failing (given the error message, I thought it was the latter).

joshcooper commented 2 years ago

I thought it was the latter

Ah based on this output, I think curl only downloaded 396 bytes:

100   396  100   396    0     0   1157      0 --:--:-- --:--:-- --:--:--  1157

Maybe Cloudfront had a cache miss and replied with a redirect, so puppet.gem contains HTML. Probably best to pass -L to follow redirects.

mhashizume commented 2 years ago

Makes sense! I've removed the loop and instead added --location and --retry options to the curl command. It looks like the version of curl that ships with the runner is plenty new to support the options we want.

ciprianbadescu commented 2 years ago

I think the change should be done in all workflows, and if it solves the issue it should be done on all other core modules.

luchihoratiu commented 2 years ago

I would personally go back to the initial solution and retry both curl and gem install. Maybe with an always increasing sleep time (15 seconds from the beginning seemed quite a lot) and a message as such:

sleep_time=0
until [ $sleep_time -ge 15 ]
do
  curl http://nightlies.puppet.com/downloads/gems/puppet${{ matrix.puppet_version }}-nightly/${{ matrix.gem_file }} --output puppet.gem
  gem install puppet.gem -N && break

  sleep_time=$((sleep_time*2+1))
  echo "Retrying download and install of gem in $sleep_time seconds..."
  sleep $sleep_time
done

@joshcooper, other than going forward with the current cleaner solution and keeping an eye on it for a while, I don't know how we could catch and check this for sure:

Maybe Cloudfront had a cache miss and replied with a redirect

Above seems safer to me.

mhashizume commented 2 years ago

Thank you for the feedback and code. I've updated my commit, please let me know if you have any additional comments.

BobosilaVictor commented 2 years ago

Closing this pr in favor of https://github.com/puppetlabs/puppetlabs-sshkeys_core/pull/60. Added the location flag for both workflows.