Closed ashleyfrieze closed 1 year ago
I have noticed that myself and it appears it has gotten worse. When I originally implemented the feature, I would run into this 1 out of 10 times. Now it is almost the reversed ratio...
I have considered addressing it in the same way as you suggest, @ashleyfrieze and will look into it.
If you are interested: put up a PR :)
Frodo CLI version
0.23.0
Describe the issue
We have been trying to apply some variable changes to our staging tenant. This seems to take about 500 seconds to restart during the apply process. However, we're experiencing about 95% failure rate in the frodo CLI during restarts.
Almost always, we get to about 430 seconds into the process, and the job aborts having received a 500 error from the remote server while polling for restart status. This could represent an issue in ForgeRock Identity Cloud, not serving valid statuses 100% of the time, but it should also be tolerated by Frodo, since the only option available to us after the pipeline has failed at this point is to restart and hope the error doesn't happen... which is a rare occurrence.
Maybe Frodo, inside the wait loop, can have a certain number of retries allowed for HTTP 500 errors. It looks like Frodo is able to poll the server again just afterwards to determine it's still restarting (i.e. if we run apply again straight away, we get told that the restart is happening from last time).
Note: the error is specifically happening in the middle of the polling period.