The check for the CR state "Running" seem to be flaky.
When having a look into it more closely, it appears that the operator deployment from the feature branch could be that slow, that the old operator is still running while continuing the test.
This change will enforce that the operator deployment is scaled to 0 and back to 1 before continuing the test.
Also, if the operator pod is removed before the agent CR, the finalizer on the CR cannot be remove any more which blocks the deletion of the entire instana-agent namespace in the worst case and locks existing environments in a way that requires manual fixes in the environment. The cleanup logic is adjusted now to remove the finalizer of the agent CR if it is still present before deleting the namespace.
This PR improves robustness of the overall test suite and adds more logging about pod states, we should still rewrite the test suite from bash to Golang to keep it reliable and extensible in the longer run.
The check for the CR state "Running" seem to be flaky. When having a look into it more closely, it appears that the operator deployment from the feature branch could be that slow, that the old operator is still running while continuing the test. This change will enforce that the operator deployment is scaled to 0 and back to 1 before continuing the test.
Also, if the operator pod is removed before the agent CR, the finalizer on the CR cannot be remove any more which blocks the deletion of the entire
instana-agent
namespace in the worst case and locks existing environments in a way that requires manual fixes in the environment. The cleanup logic is adjusted now to remove the finalizer of the agent CR if it is still present before deleting the namespace.This PR improves robustness of the overall test suite and adds more logging about pod states, we should still rewrite the test suite from bash to Golang to keep it reliable and extensible in the longer run.